CUDA GPUs
NVIDIA Introduces a Wide Range of New CUDA Libraries to Increase Accelerated Computing and Provide Science and Industrial Applications with Order-of-Magnitude Speedups. Accelerated computing lowers expenses and energy use in 6G research, AI-physics, data processing, and AI data curation, among other areas.
News summary: Order-of-magnitude speedups and cost and energy savings in data processing, generative AI, recommender systems, AI data curation, 6G research, AI-physics, and other areas are provided by new libraries in accelerated computing. Among them are:
NeMotron-4 340B
LLM applications: NeMotron-4 340B generates high-quality synthetic data and NeMo Curator adds picture curation to produce bespoke datasets.
Data processing: a new Polars GPU Engine is in open beta, and cuVS for vector search allows indexes to be built in minutes rather than days.
Physical AI: Warp uses a new TIle API to speed up calculations for physics simulations. Aerial offers new map formats for ray tracing and simulation in wireless network modeling. Additionally, Sionna provides a new toolchain for real-time inference in link-level wireless simulation.
Organizations around are progressively using NVIDIA accelerated computing to expedite programs that were once executed only on central processing units. They have been able to accomplish remarkable speed increases and save a significant amount of energy as a result.
Computational fluid dynamics modeling software for industrial purposes, such as Barracuda Virtual Reactor, which aids in the design of next-generation recycling plants, is produced by CPFD in Houston. NVIDIA accelerated computing-powered cloud instances powering plastic recycling plants use CPFD software. They can scale and perform simulations 400x quicker and 140x more energy efficiently using a CUDA GPU-accelerated virtual machine compared to a CPU-based workstation.
Several hundred thousand virtual sessions are captioned per hour by a well-known video conferencing platform. The software may query a transformer-powered voice recognition AI model three times per second while employing CPUs to generate live captions. The application’s throughput climbed to 200 inquiries per second after moving to GPUs in the cloud, representing a 66x speedup and a 25x improvement in energy efficiency.
Using an NVIDIA accelerated cloud computing system, an e-commerce website powers hundreds of millions of daily consumers in homes throughout the world by providing them with the things they need via an advanced recommendation system driven by a deep learning model. It achieved much reduced latency with a 33x speedup and roughly 12x increase in energy efficiency after moving from CPUs to GPUs in the cloud.
Cloud-based accelerated computing is expected to offer even more creative use cases, given the exponential expansion of data.
Is Sustainable Computing Using NVIDIA Accelerated Computing on CUDA GPUs?
According to NVIDIA, data centers could save 40 terawatt-hours of energy per year if all AI, HPC, and data analytics workloads that are now executed on CPU servers were accelerated by CUDA GPU. That is the annual energy usage of five million houses in the United States.
Through the utilization of CUDA GPU parallel processing power, accelerated computing significantly lowers costs and energy usage while enhancing productivity by completing tasks orders of magnitude quicker than CPUs.
GPU acceleration completes jobs rapidly and transitions into a low-power state, even when adding GPUs to a CPU-only server increases peak power. While providing better speed, GPU-accelerated computing uses a lot less energy overall than general-purpose CPUs.
When processing huge language models, NVIDIA AI computing has achieved nearly 100,000x higher energy efficiency in the last ten years. To put that into context, a vehicle would achieve 500,000 miles per gallon if efficiency increased to the same level that NVIDIA has enhanced AI efficiency on its accelerated computing platform. With it, one could travel to and from the moon using less than one gallon of petrol.
NVIDIA accelerated computing platform
Apart from the notable increases in efficiency on AI tasks, GPU computing may attain remarkable accelerations in comparison to CPUs. The figure below illustrates the 10-180x speedups that customers of the NVIDIA accelerated computing platform experiencing workloads on cloud service providers witnessed across a range of real-world jobs, from computer vision to data processing.
CPUs have found it difficult to provide the required performance as workloads continue to need exponentially more processing power. This has led to a widening performance gap and “compute inflation.” The multiyear trend shown in the graphic below shows how the expansion of data has significantly outpaced the development in computing capability per watt of CPUs.
- GPU acceleration saves energy, freeing up resources that might have otherwise been squandered.
- Accelerated computing may save enormous amounts of energy, making it a sustainable computing method.
NVIDIA CUDA GPU
The Appropriate Instruments for Each Task
Applications designed for general-purpose CPUs cannot be accelerated by GPUs. Software libraries with specialized algorithms are required to speed up certain tasks. Similar to how a mechanic would have a whole toolbox full of tools for various jobs, such as a wrench and screwdriver, NVIDIA offers a wide range of libraries to carry out low-level operations like data processing and computation.
Every NVIDIA CUDA library is designed to take use of GPU-specific hardware characteristics. Together, they represent the full potential of the NVIDIA platform.
The roadmap for the CUDA platform is being updated, including a wide range of use cases
Applications for LLM
Developers may rapidly construct bespoke datasets for large language model (LLM) use cases using the flexibility provided by NeMo Curator. They recently revealed plans to extend our multimodal assistance beyond text to include picture curation.
In order to tailor and improve models and LLM applications, SDG (synthetic data generation) adds high-quality, artificially created data to already-existing datasets. They unveiled Nemotron-4 340B, a new model suite designed for SDG that lets developers and companies work with model outputs to create unique models.
Applications of Data Processing
cuVS is an open-source framework that offers remarkable speed and efficiency for both semantic search and LLMs using GPU-accelerated vector search and clustering. The most recent version of cuVS searches big indexes at scale and enables them to be generated in minutes as opposed to hours or even days.
Using query optimizations and other methods, the open-source Polars library processes hundreds of millions of rows of data on a single computer in an efficient manner. An open beta version of a new Polars GPU engine driven by NVIDIA’s cuDF library will be accessible. It provides a speed gain of up to 10 times when compared to CPU, enabling data practitioners and their applications to benefit from the energy savings associated with accelerated computing.
AI in the physical world
Writing differentiable programs for physics simulation, perception, robotics, and geometry processing is made simpler with Warp, a high-performance GPU simulation and graphics tool that speeds up spatial computing. Support for a new Tile API, which enables developers to use GPU Tensor Cores for matrix and Fourier calculations, will be included in the next version.
Aerial is a collection of accelerated computing platforms for creating, modeling, and managing wireless networks for business use and industrial research. Aerial will get a new extension with more map formats for ray tracing and more accurate simulations in the next version.
Sionna is an open-source, GPU-accelerated framework for link-level simulations of optical and wireless communication networks. Sionna’s use of GPUs allows for orders-of-magnitude quicker simulation, opening the door for interactive system exploration and the development of next-generation physical layer research. The whole toolchain needed to create, train, and assess neural network-based receivers will be included in the next release, along with support for NVIDIA TensorRT-based real-time inference of these neural receivers.
More than 400 libraries are offered by NVIDIA. Some, such as CV-CUDA, are particularly good at pre- and post-processing computer vision tasks that are often used in mapping, video conferencing, recommender systems, and user-generated video. Others, such as cuDF, speed up the data frames and tables that are essential to pandas and SQL databases in data research.
While some of these libraries, like cuBLAS for linear algebra acceleration, are highly specialized and can be used across different workloads, others, like cuLitho for silicon computational lithography, are very adaptable and focused on a single use case.
By combining many libraries and AI models into optimized containers, NVIDIA NIM offers a fast route to production deployment for researchers who do not want to create their own pipelines using NVIDIA CUDA-X libraries. The containerized microservices give increased throughput out of the box.
Augmenting these libraries’ performance are a rising number of hardware-based acceleration capabilities that give speedups with the best energy efficiency. The NVIDIA Blackwell platform, for example, contains a decompression engine that unpacks compressed data files inline up to 18x quicker than CPUs. This greatly speeds data processing programs that need to often retrieve compressed files in storage like SQL, Apache Spark and pandas, then decompress them for runtime calculation.
Cloud computing systems that use NVIDIA’s proprietary CUDA GPU-accelerated libraries provide exceptional performance and energy efficiency for a variety of applications. This combination helps billions of people who depend on cloud-based workloads to benefit from a more sustainable and economical digital environment. It also produces considerable cost savings for enterprises and plays a critical role in improving sustainable computing.