Show HN: Clean HTML for Semantic Extraction(page-replica.github.io) |
Show HN: Clean HTML for Semantic Extraction(page-replica.github.io) |
- Reduces token count by 60-90% (fewer API costs) - Improves embedding quality (less noise = better semantic search) - Speeds up processing (smaller payloads = faster inference) - Preserves structure (headings, paragraphs, links stay intact) - Zero dependencies (pure JavaScript, no bloat)