Saturday, November 9, 2024

AMD Instinct MI325X Accelerators Lead AI Performance

- Advertisement -

The AMD Pensando Pollara 400 NIC, AMD Pensando Salina DPU, and AMD Instinct MI325X accelerators are the newest networking and accelerator solutions that AMD unveiled today. These solutions will enable the next generation of AI infrastructure at scale. Performance benchmarks for Gen AI models and data centers are raised using AMD Instinct MI325X accelerators.

Based on the AMD CDNA 3 architecture, AMD Instinct MI325X accelerators are engineered to deliver outstanding performance and efficiency for challenging AI activities that include inferencing, fine-tuning, and foundation model training. With the help of these components, AMD partners and customers may develop AI systems that are highly efficient and optimized at the system, rack, and data center levels.

- Advertisement -

AMD Instinct MI325X Continues to Provide Superior AI Performance

Industry-leading memory capacity and bandwidth are provided by AMD Instinct MI325X accelerators; 256GB of HBM3E supporting 6.0TB/s offers 1.8X more capacity and 1.3X more bandwidth than the H200. In addition, the AMD Instinct MI325X has 1.3X higher peak theoretical compute performance for FP16 and FP8 than the H200.

Up to 1.3X, 1.2X, and 1.4X the inference performance on Mistral 7B at FP16, Llama 3.1 70B at FP8, and Mixtral 8x7B at FP16 of the H200, respectively, may be obtained with this leadership memory and compute.

Currently scheduled for production shipments in Q4 2024, AMD Instinct MI325X accelerators are anticipated to be widely available for use in systems from a variety of platform providers beginning in Q1 2025, including Dell Technologies, Eviden, Gigabyte, Hewlett Packard Enterprise, Lenovo, Supermicro, and others.

AMD showcased its upcoming AMD Instinct MI350 series accelerators, continuing its dedication to an annual roadmap cadence. In comparison to AMD CDNA 3-based accelerators, AMD Instinct MI350 series accelerators are built on the AMD CDNA 4 architecture and are intended to provide a 35x increase in inference performance.

- Advertisement -

The AMD Instinct MI350 series, which offers up to 288GB of HBM3E memory per accelerator, will continue to lead the market in memory capacity. The second part of 2025 is when the AMD Instinct MI350 series accelerators are expected to be ready.

AMD Next-Gen AI Networking

The most popular programmable DPU for hyperscalers is being used by AMD to support next-generation AI networking. AI networking, which is divided into two components the front-end, which provides data and information to an AI cluster, and the backend, which controls data transmission between accelerators and clusters is essential to making sure CPUs and accelerators are used effectively in AI infrastructure.

The AMD Pensando Pollara 400, the first AI NIC in the industry ready for the Ultra Ethernet Consortium (UEC), and the AMD Pensando Salina DPU were introduced by AMD to efficiently manage these two networks and promote high performance, scalability, and efficiency throughout the system.

The third iteration of the most powerful and programmable DPU in the world, the AMD Pensando Salina, offers up to two times the speed, bandwidth, and scalability of its predecessor. Optimizing performance, efficiency, security, and scalability for data-driven AI applications, the AMD Pensando Salina DPU is a crucial part of AI front-end network clusters, supporting 400G throughput for fast data transfer rates.

The AMD Pensando Salina DPU and AMD Pensando Pollara 400 are scheduled to be available in the first half of 2025, and they are now sampling with consumers in Q4 of 2024.

New Generative AI Capabilities Offered by AMD AI Software

In order to provide the AMD ROCm open software stack with powerful new features and capabilities, AMD keeps investing in expanding software capabilities and the open ecosystem.

Among the most popular AI frameworks, libraries, and models, such as PyTorch, Triton, Hugging Face, and many more, AMD is promoting support for AMD compute engines within the open software community. For well-known generative AI models like Stable Diffusion 3, Meta Llama 3, 3.1, and 3.2, as well as the more than one million models at Hugging Face, this work translates to unconventional performance and support with AMD Instinct accelerators.

With the addition of the newest features to support cutting-edge training and inference on generative AI workloads, AMD is further developing its ROCm open software stack outside of the community. Flash Attention 3, Kernel Fusion, FP8 datatype, and other important AI capabilities are now supported by ROCm 6.2. For a range of LLMs, ROCm 6.2 offers up to a 2.4X performance boost on inference and 1.8X on training when compared to ROCm 6.0.

- Advertisement -
agarapuramesh
agarapurameshhttps://govindhtech.com
Agarapu Ramesh was founder of the Govindhtech and Computer Hardware enthusiast. He interested in writing Technews articles. Working as an Editor of Govindhtech for one Year and previously working as a Computer Assembling Technician in G Traders from 2018 in India. His Education Qualification MSc.
RELATED ARTICLES

Recent Posts

Popular Post

Govindhtech.com Would you like to receive notifications on latest updates? No Yes