I built Rust extensions for Axolotl that dramatically speed up data loading and preprocessing for LLM fine-tuning. The problem: Python data pipelines become the bottleneck when fine-tuning large models. Your GPUs sit idle waiting for data. The solution: Drop-in Rust acceleration. One import line, zero config changes. Results on 50k rows: - Streaming data loading: 0.009s vs 0.724s (77x faster) - Parallel SHA256 hashing: 0.027s vs 0.052s (1.9x faster) Works with Parquet, Arrow, JSON, JSONL, CSV. Supports compression. Cross-platform. Usage: import fast_axolotl import axolotl # now accelerated pip install fast-axolotl Built with PyO3 and maturin. MIT licensed. Happy to answer questions about the Rust/Python interop or benchmark methodology. |
No comments yet