Are embeddings a hack? Is building out tooling and databases and APIs and companies around embeddings all going to be for naught as soon as there's a solid LLM/API with a big enough context window?
Yes - embeddings are a hack:
No - there won't anything like a "real API" unless there's a new discovery or a shift in the way LLMs are constructed. It's not theoretically impossible but there's no clear way to get guaranteed results from present day LLMs, all they do output guesses from their input text (combining prompt text and then user text).
Expanding context seems like an approach, but if you're trying to get an answer about your company's documentation, why would you need the entirety of GPT-X?
Here's a relevant quote: https://simonwillison.net/2023/Apr/15/ted-sanders-openai/
Analogous, more or less, to a human with general experience (base training), experience with your code base (fine tuning), and the ability to reference the current code base directly (embedding-based search/recall). All three have a role, they are complementary rather than mutually exclusive.
Given that knowledge, as an end user it seems I would want to spend my time ensuring that the embedding data being selected is as good as possible.
This is the type of statement that I feel like is often/usually wrong -- at least for the common case. The last time I had this argument was about CDs and how eventually we'll start burning them because they'll be in the cloud, and my friend arguing that storage and network bandwidth would make that impractical if everyone did it.
I expect context window compression or smart ways to embed them so they still provide useful context in "most" cases, even if not-lossless, will be an active area of research.
EDIT: That said, looking a the original question -- I do think vector embeddings are still useful in their own right and somewhat orthogonal to context window sizes. IMO.
LLMs should not be trained to simply memorize information. Instead, they should be designed to understand and identify patterns in the data, and use the knowledge stored in vector databases to organize and summarize information.
Vector databases can be used to store and organize knowledge in a way that is more accessible to LLMs. By using vector representations, LLMs can easily access and manipulate knowledge, allowing them to more effectively process and analyze large amounts of information.