So I built two small OSS pipelines that convert open product sources into a clean, stable NDJSON schema you can bulk-index into Elasticsearch/OpenSearch. One outputs ~100K grocery products (Open Food Facts) and the other ~1M electronics-style products (Open Icecat), with strict “no image = no entry” quality gates and a shared schema contract.
Would love feedback on: • what fields you consider essential for a convincing search/relevance demo dataset • whether the schema choices (flat attrs for faceting + searchable description) match what you’ve seen work in practice