Sunday, September 15, 2024

NVIDIA Blackwell Hot Chips Presentation Improve Data Centers

- Advertisement -

NVIDIA Blackwell platform

In the trillion-dollar data center computing business, a deep technology conference for processor and system architects from academia and industry has grown to be an important platform.

Senior NVIDIA engineers will showcase the most recent developments behind the NVIDIA Blackwell platform at Hot Chips 2024 next week. They will also discuss research on liquid cooling for data centers and AI agents for chip design.

- Advertisement -
  • NVIDIA Blackwell powers the next generation of AI across application cases, industries, and nations by combining various processors, systems, and NVIDIA CUDA software.
  • The standard for AI system architecture is raised by the NVIDIA GB200 NVL72, a multi-node, liquid-cooled, rack-scale solution that links 36 Grace CPUs and 72 Blackwell GPUs.
  • All-to-all GPU connection is made possible by NVLink interconnect technology, which enables record-breaking throughput and low-latency generative AI inference.
  • In order to speed up AI computation, NVIDIA Quasar Quantization System defies the laws of physics.
  • Researchers at NVIDIA are creating AI models to aid in the development of AI processors.

NVIDIA Blackwell Release Date

On Monday, August 26, there will be an NVIDIA Blackwell lecture including new architectural details and real-world examples of generative AI models operating on NVIDIA Blackwell hardware.

It is preceded on Sunday, August 25, by three tutorials covering how AI models, such as large language model (LLM)-powered agents, can assist engineers in designing the next generation of processors and how hybrid liquid-cooling solutions can help data centers transition to more energy-efficient infrastructure.

These talks together demonstrate how NVIDIA engineers are pushing the boundaries of data center computing and architecture to provide previously unheard-of levels of performance, efficiency, and optimization.

Blackwell NVIDIA

The ultimate full-stack computing task is NVIDIA Blackwell. The Blackwell GPU, Grace CPU, BlueField DPU, ConnectX network interface card, NVLink switch, Spectrum Ethernet switch, and Quantum InfiniBand switch are among the NVIDIA processors that make up this system.

- Advertisement -
NVIDIA Blackwell is the ultimate full-stack computing challenge
Image credit to NVIDIA

NVIDIA directors of architecture Ajay Tirumala and Raymond Wong will provide an overview of the platform and discuss how these technologies combine to enable AI and faster computing performance while increasing energy efficiency.

A prime example is the multi-node NVIDIA GB200 NVL72 system. Token generation for LLM inference has to be high-throughput and low-latency. GB200 NVL72 functions as a cohesive system to provide LLM workloads with up to 30x quicker inference, enabling trillion-parameter model execution in real time.

Using examples involving LLMs and visual generative AI, Tirumala and Wong will also go over how the NVIDIA Quasar Quantization System, which combines algorithmic innovations, NVIDIA software libraries and tools, and Blackwell’s second-generation Transformer Engine, supports high accuracy on low-precision models.

Maintaining Cold Data Centers

The conventional hum of air-cooled data centers could eventually become a thing of the past as researchers create more sustainable and effective hybrid cooling systems that combine liquid and air cooling.

More effectively than air, liquid-cooling methods dissipate heat from computer systems, allowing them to remain cool even while handling heavy workloads. Additionally, compared to air-cooling systems, liquid cooling equipment is smaller and uses less energy, enabling data centers to expand their facilities by adding more server racks and computational capacity.

Ali Heydari, NVIDIA’s head of infrastructure and data center cooling, will demonstrate a number of hybrid-cooled data center ideas.

Certain designs provide a fast and simple way to add liquid-cooling capabilities to existing racks by retrofitting liquid-cooling units into air-cooled data centers. In some designs, servers must be completely submerged in immersion cooling tanks or pipes must be installed for direct-to-chip liquid cooling utilizing cooling distribution units. These solutions result in significant reductions in energy usage and operating expenses, albeit requiring a greater initial investment.

Additionally, Heydari will discuss the work his group has done as part of COOLERCHIPS, an initiative by the US Department of Energy to create cutting-edge data center cooling technology. As part of the research, the team is modeling energy consumption and cooling efficiency to enhance its data center designs by leveraging the NVIDIA Omniverse platform to generate physics-informed digital twins.

AI Agents Contribute to Processor Architecture

On a tiny level, semiconductor design is an enormous problem. By pushing the boundaries of what is physically feasible, engineers creating state-of-the-art processors strive to cram as much processing power as they can onto a silicon slice that is just a few inches broad.

AI models facilitate their job by increasing productivity and quality of designs, increasing the effectiveness of manual operations, and automating some labor-intensive activities. The models feature LLMs that may aid engineers with answering queries, writing code, diagnosing design issues, and more. They also include prediction and optimization tools to help engineers quickly examine and improve designs.

In a tutorial, Mark Ren, NVIDIA’s director of design automation research, will provide a general introduction of these models and their applications. He will discuss agent-based AI systems for semiconductor design in a follow-up session.

With the help of LLMs, AI agents may be trained to carry out activities on their own, opening up a wide range of industry applications. Researchers at NVIDIA are creating agent-based systems for microprocessor design that can engage with seasoned designers, reason and act using tailored circuit design tools, and pick up knowledge from a library of agent and human experiences.

Not only are NVIDIA engineers creating this technology, but they are also using it. Ren will provide instances of how AI agents may be used by engineers to generate code, analyze timing reports, and optimize cell cluster operations. At the first IEEE International Workshop on LLM-Aided Design, the cell cluster optimization study was awarded best paper.

NVIDIA Blackwell price

The NVIDIA Blackwell AI chip is expected to cost between 30,000–$40,000/unit. AT GTC 2024, NVIDIA CEO Jensen Huang confirmed this.

Note that this is a general estimate and cost may vary based on configuration and quantity ordered. NVIDIA may also charge cloud service providers and enterprises differently.

- Advertisement -
agarapuramesh
agarapurameshhttps://govindhtech.com
Agarapu Ramesh was founder of the Govindhtech and Computer Hardware enthusiast. He interested in writing Technews articles. Working as an Editor of Govindhtech for one Year and previously working as a Computer Assembling Technician in G Traders from 2018 in India. His Education Qualification MSc.
RELATED ARTICLES

Recent Posts

Popular Post

Govindhtech.com Would you like to receive notifications on latest updates? No Yes