What happens when you run a CUDA kernel?(fergusfinn.com) |
What happens when you run a CUDA kernel?(fergusfinn.com) |
The caveat though is that each new gen of hardware often comes with brand new constraints/features that a given generation of models haven't seen before (e.g. tcgen05 in blackwell was OOD at one point). As the models start to generalize better, this might not be a showstopper, but still an issue at least currently.
That said, a lot of the user-space "voodoo" is gone if you don't go through CUDA's "runtime API". If you use the driver API, take your kernel source as a string and compile it with NVIDIA's run-time compiler, you'll have better visibility into a lot (not all) of what's going on. For the "raw" version of this, look at:
https://github.com/NVIDIA/cuda-samples/tree/master/cpp/0_Int...
but for a much more readable, and still fully transparent modern-C++ API version of the same, try this:
https://github.com/eyalroz/cuda-api-wrappers/blob/master/exa...
that's a sample program for my CUDA API wrappers (header-only) library.