Show HN: A production-style recommender using vector retrieval and re-ranking I’ve been exploring how recommendation systems are actually implemented in production, beyond just training models. A common pattern I kept seeing is to split the problem into two stages: 1. Retrieve a small set of relevant candidates 2. Re-rank them using a model Instead of doing brute-force inference across all items, I built a small prototype around this idea. The flow looks like this: - Store embeddings in a vector database (ChromaDB) - Retrieve the Top-K most similar items/users based on vector similarity - Run a TensorFlow.js model to re-rank the candidates The goal is to reduce the search space before applying inference, which seems necessary when latency and scale matter. What I found interesting is that once you move to this approach, a lot of the complexity shifts from the model itself to the retrieval layer: - choosing K - filtering candidates - embedding quality - latency vs recall trade-offs Curious how others approach this in real systems: - How do you decide on K? - Do you rely purely on vector similarity or add heuristics? - How do you handle re-ranking at scale? Project: https://github.com/ftonato/recommendation-system-chromadb-tf... |