undefined | Dark Hacker News

1 points by jaberjaber23 269 days ago

testing cuda kernels on different gpus costs $7k/month in cloud rentals

so built an emulator instead

you give it a kernel, it predicts execution time on any gpu without running it. h100, a100, v100, whatever.

how: scraped specs for 50+ nvidia gpus, built tile-based simulator that models memory bandwidth, occupancy, and sm scheduling. validated against 12 real gpus and the mean error 1.2%

doesn't work for: dynamic parallelism, multi-gpu, tiny kernels under 1us but I will figure it out soon

if anyone's solved this differently?