In the CUDA 13.2 release , NVIDIA extended CUDA Tile support back to compute capability 8.x architectures. This instantly unlocks advanced tiling optimizations for widespread legacy deployments like NVIDIA Ampere and Ada Lovelace GPUs.
Early benchmarks from the Jülich Supercomputing Centre (Germany) show that a single H100 GPU, combined with a 100+ qubit trapped-ion QPU, simulated a quantum approximate optimization algorithm (QAOA) 8× faster than prior GPU‑only approaches for problem sizes where the quantum hardware is still noisy. The tight coupling reduces latency by over 70% compared to passing data via external hosts. cuda news today
Enables code to automatically utilize low-level tensor cores without hardcoding explicit device instructions. In the CUDA 13