From PDFs to AI-ready structured data: a deep dive (2024) | Dark Hacker News