Mind you, Cerebras is already in production: https://www.cerebras.ai/chip
This is how Cerebras is like 25x faster than nvidia.
Certainly not CUDA compatible.
Diffusion for code generation is way faster than transformer based methods but currently not preferred due to better problem solving ability of transformers.