Show HN: 30k IKEA items in flat text(huggingface.co) OP here. I took the unofficial IKEA US dataset (originally scraped by jeffreyszhou) and converted all 30,511 products into a flat, markdown-like protocol called CommerceTXT. The goal: See if a flatter structure is more efficient for LLM context windows. The results: - Size: 30k products across 632 categories. - Efficiency: The text version uses ~24% fewer tokens (3.6M saved total) compared to the equivalent minified JSON. - Structure: Files are organized in folders (e.g. /products/category/), which helps with testing hierarchical retrieval routers. The link goes to the dataset on Hugging Face which has the full benchmarks. Parser code is here: https://github.com/commercetxt/commercetxt Happy to answer questions about the conversion logic! |