Mining LLM Pre-Training Data from Codebases | Dark Hacker News