Intel has recently achieved remarkable training results in the MLCommons industry AI performance benchmark, MLPerf Training 3.0. Both the Habana® Gaudi®2 deep learning accelerator and the 4th Gen Intel Xeon Scalable processor delivered outstanding performance, highlighting Intel’s prowess in AI training.
The MLPerf results validate the total cost of ownership (TCO) value that Intel Xeon processors and Intel Gaudi deep learning accelerators offer to customers in the field of AI. Xeon processors with built-in accelerators are ideal for running high-volume AI workloads on general-purpose processors, while Gaudi delivers competitive performance for large language models (LLMs) and generative AI. Intel’s scalable systems, combined with optimized and user-friendly open software, lower the barriers for customers and partners to deploy a wide range of AI-based solutions across the data center, from the cloud to the intelligent edge.
The prevailing industry narrative suggests that generative AI and large language models can only run on Nvidia GPUs. However, the latest MLPerf Training 3.0 results demonstrate that Intel’s AI solutions provide compelling alternatives for customers who seek to break free from closed ecosystems that limit efficiency and scalability.
The MLPerf results showcase the performance of Intel’s products across various deep learning models. The maturity of Gaudi2-based software and systems for training was demonstrated at scale on the GPT-3 large language model, solidifying Gaudi2 as one of only two semiconductor solutions to submit performance results to the benchmark for LLM training of GPT-3.
Additionally, Gaudi2 offers substantial cost advantages to customers in terms of both server and system costs. The MLPerf-validated performance of Gaudi2 on GPT-3, computer vision, and natural language processing models, along with upcoming software advancements, positions Gaudi2 as an extremely compelling price/performance alternative to Nvidia’s H100.
On the CPU front, the 4th Gen Xeon processors with Intel AI engines showcased exceptional deep learning training performance. Customers can build single universal AI systems for data pre-processing, model training, and deployment using Xeon-based servers. This approach ensures the right balance of AI performance, efficiency, accuracy, and scalability.
The MLPerf results for the Habana Gaudi2 highlight its outstanding performance and efficient scalability, particularly in training generative AI and large language models. The results demonstrate Gaudi2’s impressive time-to-train on the 175 billion parameter GPT-3 model, as well as its near-linear scaling from 256 to 384 accelerators. Gaudi2 also delivered excellent training results on computer vision and natural language processing models, with performance increases compared to the previous submission, showcasing the growing software maturity of Gaudi2.
The software support for the Gaudi platform continues to mature, keeping pace with the increasing demand for generative AI and LLMs. Gaudi2’s GPT-3 submission was based on PyTorch and employed the popular DeepSpeed optimization library, enabling support for 3D parallelism and further optimizing scaling performance efficiency on LLMs. Future software releases in 2023’s third quarter are expected to bring significant performance improvements to Gaudi2.
The MLPerf results for the 4th Gen Xeon processors demonstrate that Intel Xeon processors provide enterprises with the out-of-the-box capabilities to deploy AI on general-purpose systems, eliminating the need for dedicated AI systems and associated complexities and costs. The results showcase the training efficiency of Xeon processors on various models, including BERT, ResNet-50, and RetinaNet.
MLPerf is considered the most reputable benchmark for AI performance, enabling fair and repeatable performance comparisons across different solutions. Intel’s commitment to transparency is evident as they have surpassed the 100-submission milestone and remain the only vendor to submit public CPU results with industry-standard deep-learning ecosystem software.
Furthermore, the results highlight the excellent scaling efficiency achieved using cost-effective and readily available Intel Ethernet 800 Series network adapters, which leverage the open-source Intel Ethernet Fabric Suite Software based on Intel oneAPI.
In summary, the MLCommons MLPerf Training 3.0 results underscore Intel’s impressive AI advancements, solidifying their competitive position in the field. With their portfolio of AI solutions, Intel provides customers with compelling alternatives to closed ecosystems, ensuring efficiency, scalability, and cost-effectiveness in AI training.
What is Habana Gaudi2?
Habana Gaudi2 is a new generation of AI inference accelerators from Habana Labs. It is based on the Gaudi architecture and is designed to deliver high performance and efficiency for AI workloads.
What are the key features of Habana Gaudi2?
The key features of Habana Gaudi2 include:
Up to 180 TOPS of AI performance
Up to 1.2 TFLOPS of FP32 performance
Up to 32 GB of HBM2 memory
Low power consumptionWhat are the applications of Habana Gaudi2?
Habana Gaudi2 is designed for a wide range of AI workloads, including:
Natural language processing
Computer vision
Recommendation systems
Fraud detectionWhat are the system requirements for Habana Gaudi2?
The system requirements for Habana Gaudi2 include:
A Gaudi2-based accelerator card
A host CPU with 64 GB of RAM
A Linux operating systemWhat are the alternatives to Habana Gaudi2?
The alternatives to Habana Gaudi2 include:
NVIDIA Tensor Core GPUs
Intel Xeon CPUs with integrated AI accelerators
AMD EPYC CPUs with integrated AI acceleratorsWhen will Habana Gaudi2 be available?
Habana Gaudi2 is expected to be available in Q4 2023.
Where can I buy Habana Gaudi2?
Habana Gaudi2 will be available through Habana Labs’ authorized resellers.
[…] developers to use a single codebase for deploying applications across multiple architectures. Intel oneAPI tools and optimized AI frameworks deliver high performance for deep learning and molecular dynamics. […]
[…] 11, Intel held a press conference in Beijing to announce the release of the deep learning processor Habana Gaudi 2 in the Chinese market, as the US moves to tighten export regulations on AI processors to China. The […]
[…] to quickly and efficiently enter numerous clouds at scale, creating enormous clusters that run on Intel Xeon 3rd Gen in minutes as opposed to hours or days as when deployments are done manually. With the […]
[…] Proceeded to run inference on Xeon processors of the 4th generation […]
[…] language processing models. Results were provided by Intel for the Habana Gaudi2 accelerators, the 4th Generation Intel Xeon Scalable CPUs, and the Intel Xeon CPU Max Series. The findings demonstrate Intel’s competitive performance […]
[…] an announcement on the general availability of the bare metal Compute instance BM.GPU.H100.8 with 4th Generation Intel Xeon Scalable Processors for use in the Oracle Cloud Infrastructure (OCI). In the past, this situation was referred to by […]
[…] submitted results for Intel Gaudi2 accelerators and 4th Gen Intel Xeon Scalable CPUs with Intel Advanced Matrix Extensions (Intel […]