NVIDIA cuPyNumeric
Introducing multi-GPU and multi-node (MGMN) accelerated computing with zero-code-change scalability.
Researchers and scientists utilize Python, a strong and intuitive programming language, extensively for data science, machine learning (ML), and efficient numerical computation. The de facto standard math and matrix library is NumPy, which offers a straightforward and user-friendly programming paradigm with interfaces that closely match the mathematical requirements of scientific applications.
CPU-based Python and NumPy applications require assistance in order to satisfy the speed and scalability requirements of cutting-edge research as data quantities and computational complexity increase.
An infrastructure for effectively addressing and testing theories in data-driven challenges is provided by distributed accelerated computing. Researchers are increasingly looking for ways to easily scale their programs, whether they are creating ML models, developing novel approaches to handle intricate computational fluid dynamics issues, or evaluating data produced by recording the scattering of high-energy electron beams.
The goal of NVIDIA cuPyNumeric is to provide distributed and accelerated computation on the NVIDIA platform for the Python community by serving as a drop-in replacement library for NumPy. Without worrying about distributed or parallel computing, it enables scientists and researchers to develop their research programs efficiently utilizing the native Python language and well-known tools. Then, without altering the code, cuPyNumeric and Legate may effortlessly scale their applications from single-CPU systems to MGMN supercomputers.
Advantages of NVIDIA cuPyNumeric
Legate’s NVIDIA cuPyNumeric library:
- Supports the NumPy interface and the native Python language without limitations.
- Scales and speeds up current NumPy workflows transparently offers NumPy a smooth drop-in substitute.
- Offers automated acceleration and parallelism for several nodes spanning CPUs and GPUs.
- Optimally scales from a single CPU to thousands of GPUs.
- Requires few code modifications, enabling scientific activities to be completed more quickly.
is openly accessible. - Start using Conda or GitHub.
NVIDIA cuPyNumeric GPU acceleration
Scientists may now utilize GPU acceleration at the cluster scale with to NVIDIA’s cuPyNumeric release
By enabling researchers to easily expand to powerful computing clusters without changing their Python code, the Accelerated Computing Library promotes scientific discovery.
Many scientists face the same problem: they must sift through petabytes of data to find insights that might further their studies, whether they are studying the behaviors of nanoscale electrons or bright galaxies merging millions of light years distant.
Researchers may now easily execute their data-crunching Python scripts on CPU-based laptops, GPU-accelerated workstations, cloud servers, or enormous supercomputers with to the NVIDIA cuPyNumeric accelerated computing library. They will be able to decide on intriguing data points, patterns worth looking into, and experiment modifications more quickly if they can process their data more quickly.
Researchers don’t need to be computer scientists to make the transition to accelerated computing. They may apply cuPyNumeric to pre-existing code or develop code using the well-known NumPy interface, according to performance and scalability best practices.
They may execute their programs on one or hundreds of GPUs with no code modifications after applying cuPyNumeric.
The most recent version of cuPyNumeric, which is now accessible on GitHub and Conda, has enhanced memory scalability, automated resource setting during runtime, and support for the NVIDIA GH200 Grace Hopper Superchip. Additionally, HDF5, a widely used file format in the scientific community that facilitates the effective management of massive, complicated data, is supported.
CuPyNumeric has been used by researchers at the National Payments Corporation of India, Australia National University, UMass Boston, the Center for Turbulence Research at Stanford University, Los Alamos National Laboratory, and the SLAC National Accelerator Laboratory to significantly improve their data analysis workflows.
Less Is More: Limitless GPU Scalability Without Code Changes
Millions of researchers in scientific domains such as astronomy, drug discovery, materials science, and nuclear physics utilize Python, the most popular programming language for data science, machine learning, and numerical computation. The NumPy arithmetic and matrix library, which was downloaded over 300 million times last month, is used by tens of thousands of packages on GitHub. Accelerated computing with cuPyNumeric might be advantageous for all of these applications.
In order to process ever-larger datasets gathered by devices such as electron microscopes, particle colliders, and radio telescopes, many of these scientists create programs that utilize NumPy and operate on a single CPU-only node. This limits the throughput of their methods.
By offering a drop-in substitute for NumPy that can expand to thousands of GPUs, cuPyNumeric assists researchers in keeping up with the increasing volume and complexity of their datasets. When cuPyNumeric scales from a single GPU to a whole supercomputer, no code modifications are needed. Because of this, researchers may easily conduct their studies on any size accelerated computer equipment.
Solving the Big Data Problem, Accelerating Scientific Discovery
Scientists at Stanford University’s SLAC National Accelerator Laboratory, a U.S. Department of Energy lab, have discovered that cuPyNumeric speeds up X-ray research at the Linac Coherent Light Source.
A semiconductor materials research discovery team at SLAC discovered that cuPyNumeric reduced run time from minutes to seconds and six times sped their data analysis application. When the team conducts tests at this highly specialized facility, this speedup enables them to perform critical analysis in simultaneously.
The team expects to find novel material characteristics, communicate discoveries, and publish work faster by making better use of experiment hours.
Other organizations that make use of cuPyNumeric include:
Researchers at Australia National University scaled the Levenberg-Marquardt optimization method to operate on multi-GPU systems at the nation’s National Computational Infrastructure using cuPyNumeric. Although there are several uses for the method, the researchers’ first focus is on large-scale weather and climate models.
Researchers at Los Alamos National Laboratory are using cuPyNumeric to speed up machine learning, computational science, and data science methods. With the help of cuPyNumeric, they will be able to utilize the newly released Venado supercomputer, which has more than 2,500 NVIDIA GH200 Grace Hopper Superchips, more efficiently.
Researchers at the Center for Turbulence Research at Stanford University are utilizing cuPyNumeric to create Python-based computational fluid dynamics solvers that can operate at scale on massive accelerated computer clusters. Complex applications like online training and reinforcement learning are made possible by these solvers’ ability to smoothly combine enormous sets of fluid simulations with well-known machine learning libraries like PyTorch.
A research team at UMass Boston is speeding up linear algebra computations to examine movies of microscopy and calculate the energy released by active materials. The group broke down a matrix with 4,000 columns and 16 million rows using cuPyNumeric.
About 250 million Indians utilize the National Payments Corporation of India’s real-time digital payment system every day, and it is growing internationally. NPCI tracks the transaction pathways between payers and payees using intricate matrix computations. Using existing techniques, processing data for a one-week transaction window on CPU systems takes around five hours.
According to an experiment, using cuPyNumeric to speed up calculations on multi-node NVIDIA DGX systems potentially 50x matrix multiplication. This would allow NPCI to evaluate bigger transaction windows in less than an hour and identify suspected money laundering almost instantly.