NVIDIA CUDA-X libraries
The NVIDIA CUDA-X libraries, which are powered by the NVIDIA GB200 and GH200 superchips, enable scientists and engineers of all stripes to tackle complex problems much more quickly.
Developers can now benefit from tighter automatic integration and coordination between CPU and GPU resources made possible by CUDA-X working with these latest superchip architectures, which were announced today at the NVIDIA GTC global AI conference. This results in up to 11x speedups for computational engineering tools and 5x larger calculations when compared to using traditional accelerated computing architectures.
Engineering simulation, design optimization, and other procedures are accelerated and improved, allowing scientists and researchers make breakthroughs faster.
NVIDIA released CUDA in 2006 to accelerate numerous applications. Over 900 domain-specific NVIDIA CUDA-X libraries and AI models from NVIDIA accelerate computation and advance science. Now, a wide range of engineering fields, such as semiconductor design, automotive, aerospace, particle physics, quantum physics, and astronomy, can benefit from faster computing with CUDA-X.
The NVIDIA Grace CPU architecture boosts memory bandwidth and reduces power consumption. NVIDIA NVLink-C2C interconnects’ high bandwidth lets the GPU and CPU share memory, allowing developers to write less specialized code, handle more complicated problems, and improve application performance.
Accelerating Engineering Solvers With NVIDIA cuDSS
Through more effective use of CPU and GPU processing power, NVIDIA’s superchip architectures enable customers to get more performance out of the same underlying GPU.
Large engineering simulation problems involving sparse matrices are resolved by the NVIDIA cuDSS library for use in design optimization, electromagnetic simulation workflows, and other applications. Large matrices that typically wouldn’t fit in device memory are factorized and solved by cuDSS using Grace GPU memory and the high-bandwidth NVLink-C2C interface. Users can tackle incredibly complex issues in a fraction of the time because to this.
Large systems’ overhead is greatly reduced by the coherent shared memory between the GPU and Grace GPU, which minimizes data travel. Using the same GPU and cuDSS hybrid memory, leveraging the Grace CPU memory and superchip architecture accelerated the most complex solution stages by up to 4x for a variety of major computational engineering problems.
With the integration of cuDSS into its HFSS solver, Ansys has significantly improved electromagnetic simulation performance. HFSS software can increase the matrix solver’s speed by up to 11 times using cuDSS.
The cuDSS Direct Sparse Solver library has also been incorporated by Altair OptiStruct, significantly speeding up its finite element analysis tasks.
By strategically utilizing CPUs for shared memory and heterogeneous CPU and GPU execution, together with improving critical activities on the GPU, these performance improvements are made possible. To further improve efficiency, cuDSS automatically identifies instances where CPU utilization offers extra advantages.
Scaling Up at Warp Speed With Superchip Memory
Because of the GB200 and GH200 architectures’ NVLink-CNC interconnects, which offer CPU and GPU memory coherency, memory-limited applications can be scaled on a single GPU.
Large simulations are needed to generate the resolution required to build equipment with complex components, such aircraft engines, because many engineering simulations are scale-limited. The ability of CPU and GPU memories to read and write data fluidly allows developers to quickly construct out-of-core algorithms to analyze larger data sets.
For instance, Autodesk used eight GH200 nodes to simulate up to 48 billion cells using NVIDIA Warp, a Python-based framework for speeding up data production and spatial computing applications. Compared to simulations that could be created with eight NVIDIA H100 nodes, this is more than five times larger.
Powering Quantum Computing Research With NVIDIA cuQuantum
Numerous scientific and industrial fields have fundamental challenges that quantum computers promise to speed up. The ability to model incredibly complicated quantum systems is crucial to reducing the time to practical quantum computing.
Through simulations, scientists can create new algorithms today that will function at scales appropriate for quantum computers of the future. By doing intricate simulations of the performance and noise characteristics of novel qubit designs, they also significantly contribute to the advancement of quantum processors.
Matrix operations must be carried out on exponentially enormous vector objects that need to be stored in memory in so-called state vector simulations of quantum algorithms. In contrast, tensor network simulations use tensor contractions to mimic quantum algorithms, allowing for the modeling of hundreds or thousands of qubits for some significant application classes.
These workloads are accelerated by the NVIDIA cuQuantum library. All quantum researchers can access simulation performance without modifying any code because cuQuantum is integrated with all of the top quantum computing frameworks.
Memory requirements typically limit the scale of quantum algorithm simulations. Because the GB200 and GH200 architectures allow for the utilization of huge CPU memory without causing performance bottlenecks, they offer an excellent foundation for scaling up quantum simulations. On quantum computing benchmarks, a GH200 system can outperform an H100 system with an x86 by up to three times.