We've been doing lots of GPU kernel profiling and optimization on cloud infrastructure, but without local GPU hardware, that meant constant SSH juggling: upload code, compile remotely, profile kernels, download results, repeat. We were spending more time managing infrastructure than writing optimized kernels. So we built Chisel: one command spins up a cloud instance, syncs your GPU code, runs profiling (supports Python/PyTorch/TensorFlow and AMD rocprofv3), and automatically pulls results back. Zero local GPU hardware required. Next up: Web dashboard for visualizing results, simultaneous profiling across multiple GPU types, and automatic resource cleanup. Available via PyPI: pip install chisel-cli Github: https://github.com/Herdora/chisel We're actively developing and would love community feedback, especially from GPU devs exploring AMD alternatives to NVIDIA. Feature requests and contributions welcome! |
No comments yet