Ask HN: Storing and processing less than 1TB of unstructured data? Many HN folks deal with data problems, so I thought I'd ask: if you had to store and index less than 1TB of unstructured (plain text) data, what would you use? I have a bunch of text files and HTML pages that I'd like to dump into something and then be able to search over it, maybe even be able to find relationships (common terms, phrases, etc) between the various docs. I've heard of things like hadoop, but that seems to be overkill for the amount of data I have. I'd also like to keep things as low-cost as possible as this is just for personal use. I've looked at a few of the cloud providers but am honestly not sure what I'm looking for, so I find myself walking away more confused than when I started. This seems like an easy problem, but for whatever reason I'm getting wrapped around the axle on it. |