Show HN: PyTorch K-Means GPU-friendly, single-file, hierarchical and resampling I built a small, self-contained K-Means implementation in pure PyTorch: https://gitlab.com/hassonofer/pt_kmeans I was working on dataset sampling and approximate nearest neighbor search, and tried several existing libraries for large-scale K-Means. I couldn't find something that was fast, simple, and would run comfortably on my own workstation without hitting memory limits. Maybe I missed an existing solution, but I ended up writing one that fit my needs. The core insight: Keep your data on CPU (where you have more RAM) and intelligently move only the necessary chunks to GPU for computation during the iterative steps. Results always come back to CPU for easy post-processing. (Note: For K-Means++ initialization when computing on GPU, the full dataset still needs to fit on the GPU.) It offers a few practical features:
Future plans:
The implementation handles both L2 and cosine distances, includes K-Means++ initialization.Available on PyPI (`pip install pt_kmeans`) and the full implementation is at: https://gitlab.com/hassonofer/pt_kmeans Would love feedback on the approach and any use cases I might have missed! |