So we built Chisel: one command to run profiling commands on any kernel. Zero local GPU hardware required.
Next up we're planning to build a web dashboard for visualizing results, simultaneous profiling across multiple GPU types, and automatic resource cleanup. But please let us know what you would like to see in this project.
Available via PyPI: pip install chisel-cli
Github: https://github.com/Herdora/chisel
We're actively developing and would love community feedback. Feature requests and contributions always welcome!