Ask HN: What would you use to query large (2-25TB) of read-only data? The data set is written once a month in bulk, and read many times by different users. The last month of data (~2TB) is the hotspot. A year's worth is ~25TB. The attributes of one record are as follows:
For each 1-4 character string, there are many records -- sometimes several per second. As an example, in the span of a month, one of these strings can be associated with 18 million records. There are about 10,000 unique 1-4 character strings, but not all as active as the previous example. The data is queried by two attributes: 1-4 character string and timestamp.Potential solutions I've come up with (feel free to debate any of these):
Feel free to ask any questions too. |