So I built code editor for CUDA that does it all:
Profile and benchmark your kernels in real-time while you code
Emulate multi-GPU without the hardware
Get AI optimization suggestions that actually understand your GPU "you can use local llm to cost you 0$"
It's free to use if you use your local LLM :D Still needs a lot of refinement, so feel free to share anything you'd like to see in it