AI Processor
On July 11, Intel held a press conference in Beijing to announce the release of the deep learning processor Habana Gaudi 2 in the Chinese market, as the US moves to tighten export regulations on AI processor to China. The training and inference processes for AI can be sped up with this CPU. Giant Chinese AI server companies like Inspur, New H3C, and xFusion are anticipated to deploy servers with the Gaudi 2 CPU in them.
Chinese media reports that Intel’s Gaudi 2 was already released in May 2022 in the European and American markets. The Supermicro Gaudi 2 AI training server system will be introduced by Habana and American business Supermicro in the second part of 2022.
To comply with regulations, a tailored version of the AI processor is also intended for the Chinese market.
Chen Baoli, Intel Data Centre and AI Group vice president and China general manager.
At the time Gaudi 2 was released to China, Chen said Intel optimised software-level iterative computing capacity to suit the LLM trend.
Inspur alone owned a 37% market share in the Chinese AI server market in 2022, while New H3C had 8%. With the help of xFusion, they are marketing Intel’s specialised AI processor, according to statistics from China-based Pacific Securities. This demonstrates how hard Intel’s Gaudi 2 is trying to compete in the Chinese market and offer a rival to Nvidia.
Despite being available in the American and European markets for more than a year, there have been no indications of widespread adoption for Intel’s Gaudi 2. Intel could be able to bridge the existing AI processing power disparity in China in the interim.
The most effective solution in Intel’s product line for LLM tasks, according to Sandra L. Rivera, executive vice president and general manager of the Data Centre and AI Group at Intel Gaudi 2. Additionally, Intel will revise its roadmap for data centre products, and it intends to combine high-performance AI processor with GPUs to create a more complete next-generation GPU product by 2025.
In 2016, an Israeli startup semiconductor business called Habana first came into existence. Major Chinese internet businesses now sell its products. Intel acquired the Gaudi accelerator after purchasing Habana for US$2 billion in December 2019. Major Chinese internet and cloud service providers were already familiar with Habana’s Gaudi processor before the purchase.
The Habana Gaudi2 Processor for Deep Learning
The High-Efficiency Gaudi Architecture Gets Even Better
Habana introduced the Gaudi2, a second-generation deep learning processor that dramatically improves training performance, at Intel Vision 2022 in May of last year. It improves price-to-performance by up to 40% on AWS EC2 DL1 cloud instances and on-premises in the Supermicro Gaudi AI Training Server, building on the first-generation, highly efficient Gaudi design.
It reduces the manufacturing size from 16nm to 7nm, adds support for FP8, integrates a media compression engine, and boosts the number of AI-customized Tensor Processor Cores from 8 to 24. The in-package memory of the Gaudi2 has tripled to 96 GB of HBM2e with a bandwidth of 2.45 TB/s. These developments provide greater throughput on widely used computer vision and natural language processing models as compared to the NVIDIA A100 80G.
Habana has integrated 24 100-gigabit RDMA over Converged Ethernet (RoCE2) ports on each Gaudi2, up from 10 ports on the first-generation Gaudi, making it easy and affordable for clients to scale their training capacity. Each Gaudi2 has 21 ports that are used to link to the other seven processors in the server in an all-to-all, non-blocking fashion. The 8-card Gaudi server, the HLS-Gaudi2, has 2.4 terabits of networking throughput, with three of each processor’s ports set aside for scale out. An 8-Gaudi2 baseboard is another option Habana provides to help customers streamline their system design.
Customers may easily scale and customise Gaudi2 systems to meet their deep learning cluster requirements, from one to 1,000s of Gaudi2s, to the inclusion of RoCE on chip. Gaudi2’s system implementation on industry-standard Ethernet allows for additional cost reductions by giving clients a large selection of Ethernet switching and networking equipment. Additionally, the networking interface controller ports’ on-chip integration reduces the number of components and overall system cost.

Gaudi2 network configuration
Like its predecessor, Gaudi2 facilitates the transition from GPU-based models to Gaudi hardware by providing users with the Habana Synapse AI Software Suite, which is optimised for deep learning model building. It incorporates more than fifty reference models for computer vision and natural language processing, as well as the TensorFlow and PyTorch frameworks. On the Habana GitHub repository, developers may get reference models, model roadmaps, how-to content, documentation, and tools.
It only takes two lines of code to get model migration going. Habana provides the complete toolkit for experienced users that want to program their own kernels. Models can be trained on Gaudi2 and inferred on any target, such as Habana Greco, Intel Xeon CPUs, or Gaudi2 itself, to Synapse AI. Additionally, Synapse AI is integrated with ecosystem partners including Grid.ai Pytorch Lightning, Hugging Face with transformer model repositories and tools, and cnvrg.io MLOps software.
The Habana team submitted performance statistics for June publishing in the MLPerf industry benchmark ten days following Gaudi2’s debut. Habana’s May 2022 MLPerf submission outperformed NVIDIA’s A100-80G submission for 8-card server for both the vision (ResNet-50) and language (BERT) models, according to the Gaudi2 results, which demonstrate significant improvements in time-to-train.
Gaudi2 substantially reduced the time-to-train for ResNet-50 by 36% when compared to NVIDIA’s submission for the A100-80GB and 45% when compared to Dell’s submission, which was cited for an A ResNet-50 and BERT-submitted 100-40GB 8-accelerator server. Gaudi2 delivers a 3x speedup in training throughput for ResNet-50 and a 4.7x speedup for BERT when compared to its first-generation Gaudi. These improvements are due to the many technology and software advancements made since Gaudi’s time.
Unlike Habana’s commercial software stack, which is offered to its clients straight out of the box, the performance of both generations of Gaudi processors is accomplished without the need for further software modifications. Customers can therefore anticipate employing Habana’s commercial software to get MLPerf-comparable outcomes in their own Gaudi or Gaudi2 systems. Habana can provide customers outstanding performance at a very low price since both generations of Gaudi were built to deliver superior deep learning efficiency.
[…] hackathons are redefining standards for AI application development across industries by leveraging Intel’s AI stack and […]