CUDA Performance: Maximizing Instruction-Level Parallelism | Dark Hacker News