Tuesday, October 15, 2024

AMD ROCm 6.2.3 Brings Llama 3 And SD 2.1 To Radeon GPUs

- Advertisement -

AMD ROCm 6.2.3

AMD recently published AMD ROCm 6.2.3, the most recent version of their open compute software that supports Radeon GPUs on native Ubuntu Linux systems. Most significantly, this latest edition enables developers to use Stable Diffusion (SD) 2.1 text-to-image capabilities in their AI programming and offers amazing inference performance with Llama 3 70BQ4.

AMD focused on particular features to speed up the development of generative AI after its last release with AMD ROCm 6.1. Using vLLM and Flash Attention 2, AMD ROCm 6.2 provides pro-level performance for Large Language Model inference. This release also includes beta support for the Triton framework, enabling more users to develop AI functionality on AMD hardware.

- Advertisement -

The following are AMD ROCm 6.2.3 for Radeon GPUs’ four main feature highlights:

  • The most recent version of Llama is officially supported by vLLM. AMD ROCm on Radeon with Llama 3 70BQ4 offers amazing inference performance.
  • Flash Attention 2 “Forward Enablement” is officially supported. Its purpose is to speed up inference performance and lower memory requirements.
  • Formally endorsing stable diffusion (SD) The SD text-to-image model can be integrated into your own AI development.
  • Triton Beta Support: Use the Triton framework to quickly and simply develop high-performance AI programs with little experience

Since its first 5.7 release barely a year ago, AMD ROCm support for Radeon GPUs has advanced significantly.

With version 6.0, it formally qualified the usage of additional Radeon GPUs, such as the Radeon PRO W7800 with 32GB, and greatly increased AMD ROCm’s capabilities by adding support for the widely used ONNX runtime.

Another significant milestone was reached with the release of AMD ROCm 6.1, where it declared official support for the TensorFlow framework and multi-GPU systems. It also granted beta access to Windows Subsystem for Linux (WSL 2), which is now officially eligible for use with 6.1.

- Advertisement -

The AMD ROCm 6.2.3 solution stack for Radeon GPUs is as follows:

Although Linux was the primary focus of AMD ROCm 6.2.3, WSL 2 support will be released shortly.

ROCm on Radeon for AI and Machine Learning development has had a fantastic year, and it is eager to keep collaborating closely with the community to improve its product stack and support its system builders in developing attractive on-premises, client-based solutions.

Evolution of AMD ROCm from version 5.7 to 6.2.3

From version 5.7 to 6.2.3, AMD ROCm (Radeon Open Compute) has made substantial improvements to performance, hardware support, developer tools, and deep learning frameworks. Each release’s main improvements are listed below:

AMD ROCm 5.7

  • Support for New Architectures: ROCm 5.7 included support for AMD’s RDNA 3 family. This release expanded the GPUs that can utilize ROCm for deep learning and HPC.
  • HIP Improvements: AMD’s HIP framework for running CUDA code on AMD GPUs was optimized to facilitate interoperability between ROCm-supported systems and CUDA-based workflows.
  • Deep Learning Framework Updates: TensorFlow and PyTorch were made more compatible and performant. These upgrades optimized AI workloads in multi-GPU setups.
  • Performance Optimizations: This version improved HPC task performance, including memory management and multi-GPU scaling.

AMD ROCm 6.0

  • Unified Memory Support: ROCm 6.0 fully supported unified memory, making CPU-GPU data transfers smoother. This feature improved memory management, especially for applications that often access these processors’ memory.
  • New Compiler Infrastructure: AMD enhanced the ROCm Compiler (LLVM-based) for greater performance and larger workload support. We wanted to boost deep learning, HPC, and AI efficiency.
  • ROCm 6.0 might target more GPUs, especially in HPC, due to improved scalability and RDNA and CDNA architecture compatibility.
  • New CUDA compatibility features were added to the HIP API in this edition. These changes let developers convert CUDA apps to ROCm.

AMD ROCm 6.1

  • Optimized AI/ML Framework Compatibility: ROCm 6.1 improved PyTorch and TensorFlow performance. This improved mixed precision training, which maximizes GPU utilization in deep learning.
  • Experimental HIP Tensor Cores support allowed AI models to use hardware-accelerated matrix operations. This improvement greatly accelerated matrix multiplication, which is essential for deep learning.
  • Expanded Container Support: AMD included pre-built Docker containers that were easier to connect with Kubernetes in ROCm 6.1, simplifying cloud and cluster deployment.
  • More efficient data transfer in multi-GPU systems was achieved by improving memory and I/O operations.

AMD ROCm 6.1.3

  • Support for several GPUs makes it possible to create scalable AI desktops for multi-user, multi-serving applications.
  • These solutions can be used with ROCm on a Windows OS-based system thanks to beta-level support for Windows Subsystem for Linux.
  • More options for AI development are provided via the TensorFlow Framework.

AMD ROCm 6.2

  • New Kernel and Driver Features: ROCm 6.2 improved low-level driver and kernel support for advanced computing workload stability and performance. This significantly strengthened ROCm in enterprise environments.
  • AMD Infinity Architecture Integration: ROCm 6.2 enhances AMD’s Infinity Architecture, which connects GPUs at fast speeds. Multi-GPU configurations performed better, especially for large-scale HPC and AI applications.
  • HIP API expansion: ROCm 6.2 improved the HIP API, making CUDA-based application conversion easier. Asynchronous data transmission and other advanced features were implemented in this release to boost computational performance.

AMD ROCm 6.2.3

  • The most recent version of Llama is officially supported by vLLM. AMD ROCm on Radeon with Llama 3 70BQ4 offers amazing inference performance.
  • Flash Attention 2 “Forward Enablement” is officially supported. Its purpose is to speed up inference performance and lower memory requirements.
  • Formally endorsing stable diffusion (SD) The SD text-to-image model can be integrated into your own AI development.
  • Triton Beta Support: Use the Triton framework to quickly and simply develop high-performance AI programs with little experience

Key Milestone Summary

  • These versions supported new AMD architectures (RDNA, CDNA) as ROCm expanded its hardware support.
  • CUDA application porting became easier with HIP updates, while data scientists and academics found AI/ML framework support more useful.
  • Multi-GPU Optimizations: Unified memory support, RDMA, and AMD Infinity Architecture improved multi-GPU deployments, which are essential for HPC and large-scale AI training.
  • Each release improved ROCm’s stability and scalability by fixing bugs, optimizing memory management, and improving speed.
  • AMD now prioritizes an open, high-performance computing platform for AI, machine learning, and HPC applications.
- Advertisement -
Drakshi
Drakshi
Since June 2023, Drakshi has been writing articles of Artificial Intelligence for govindhtech. She was a postgraduate in business administration. She was an enthusiast of Artificial Intelligence.
RELATED ARTICLES

Recent Posts

Popular Post

Govindhtech.com Would you like to receive notifications on latest updates? No Yes