Benchmarking AI Inference with Intel

September 12, 2023

174

Intel AI Inference Performance

The results of MLCommons’ MLPerf Inference v3.1 performance benchmark have been made public. The benchmark was run on GPT-J, a huge language model with 6 billion parameters, as well as computer vision and natural language processing models. Results were provided by Intel for the Habana Gaudi2 accelerators, the 4th Generation Intel Xeon Scalable CPUs, and the Intel Xeon CPU Max Series. The findings demonstrate Intel’s competitive performance for AI inference and confirm the company’s commitment to make artificial intelligence more accessible at scale across the whole continuum of AI workloads, from the client and edge to the network and the cloud. This commitment was reinforced by the findings.

Why It Is Important: In addition to the MLCommons AI training update from June and the Hugging Face performance benchmarks, which validated that Gaudi2 can outperform Nvidia’s H100 on a state-of-the-art vision language model, the results from today further reinforce the notion that Intel offers the only viable alternative to Nvidia’s H100 and A100 for AI compute needs.

Intel is bringing artificial intelligence (AI) to everywhere with technologies that can manage inference and training throughout the continuum of artificial intelligence workloads. Every client has unique requirements, and Intel is delivering AI everywhere. Intel’s solutions for artificial intelligence provide clients with freedom and choice when it comes to selecting an appropriate AI solution based on their own specific performance, efficiency, and cost requirements, while also assisting customers in breaking out of closed ecosystems.

Concerning the Habana Gaudi2 Evaluations: The strong confirmation of GPT-J’s competitive performance that is provided by the Habana Gaudi2 inference performance data is provided.

Gaudi2’s inference performance on GPT-J-99 and GPT-J-99.9 is 78.58 per second for server queries and 84.08 per second for offline samples, respectively.

When compared to Nvidia’s H100, Gaudi2 offers impressive performance; yet, H100 shows a little edge in performance, with 1.09 times the server performance and 1.28 times the offline performance of Gaudi2.

Gaudi2 is 2.4 times faster than Nvidia’s A100 when running on a server and is twice as fast while running offline.

The Gaudi2 submission used FP8 and achieved an accuracy of 99.9% on this newly introduced data format.

About the Intel Xeon Results:Intel anticipates that it will be able to continue offering performance enhancements and extended model coverage in MLPerf benchmarks with the help of Gaudi2 software upgrades that are issued every six to eight weeks.

Regarding the Results Obtained by Intel Xeon, Intel used its 4th Generation Intel Xeon Scalable processors in all seven of the inference benchmarks that were run, including GPT-J. These findings demonstrate excellent performance for AI workloads that serve broad purposes. These models include vision, language processing, voice and audio translation, in addition to the considerably bigger DLRM v2 recommendation and ChatGPT-J models. In addition, Intel is the only company that has presented public CPU results using deep learning ecosystem software that is considered to be the industry standard.

Building and deploying general-purpose AI workloads using the most common AI frameworks and libraries is well suited for the 4th Generation Intel Xeon Scalable CPU, which is suitable for this purpose. In the GPT-J job of summarizing 100 words from a news story ranging from around 1,000 to 1,500 words, 4th Generation Intel Xeon processors were able to summarize two paragraphs per second while operating in offline mode and just one paragraph per second when operating in real-time server mode.

Intel has just recently provided MLPerf findings for the Intel Xeon CPU Max Series, which offers up to 64 gigabytes (GB) of high-bandwidth memory. This is the first time that Intel has done so. It was the only CPU that could reach 99.9% accuracy for the GPT-J benchmark, which is essential for applications in which the utmost precision is of fundamental importance to speed.

Intel worked in collaboration with its original equipment manufacturer (OEM) customers to offer their own submissions. This further demonstrated the scalability of AI performance as well as the broad availability of general-purpose servers powered by Intel Xeon processors that are able to satisfy customer service level agreements (SLAs).

What’s Next: MLPerf, which is widely considered to be the most trustworthy benchmark for AI performance, allows fair and consistent performance comparisons. Intel plans to provide updated data regarding the performance of AI training when the next MLPerf test is conducted. The continuous enhancements to performance demonstrate Intel’s dedication to supporting clients and addressing every node along the AI continuum, from low-cost AI CPUs to the highest-performing AI hardware accelerators and GPUs for corporate, network, and cloud customers respectively.

Source(s)

2 COMMENTS

IBM And AWS's Collaboration Is Effective September 12, 2023 At 12:12 pm
[…] data is king and AI is the game-changer for our clients. The collaboration between IBM and AWS stands out as an […]
Log in to leave a comment
Subsea Fiber Solutions With Multi-Core Fiber September 13, 2023 At 12:05 pm
[…] infrastructure in the coming years as the demand for online content, cloud services, and AI applications continues to rise. This is an exciting new route for cable capacity scaling, which will pave the […]
Log in to leave a comment

Benchmarking AI Inference with Intel

Intel AI Inference Performance

Modern Art of Bahia Museum’s Unique Heritage Collection

Fitbit Sleep Data Links Health And Sleep In A Recent Study

Huawei Watch GT 5: Redefining Smartwatch Excellence

2 COMMENTS

LEAVE A REPLY Cancel reply

Recent Posts

Modern Art of Bahia Museum’s Unique Heritage Collection

Fitbit Sleep Data Links Health And Sleep In A Recent Study

Huawei Watch GT 5: Redefining Smartwatch Excellence

Gemini’s Big Upgrade: 1.5 Flash, Faster Replies, More Access

Precision 7960 Tower & LLMs In Dell Precision Workstations

Updates to Azure AI, Phi 3 Fine tuning, And gen AI models

Popular Post

ASRock’s creative AMD FP6 series thin mini-ITX motherboard

ASUS ProArt PA602 The Most Elegant Computer Case!

Cardea Z540 SSD Revolutionizes Storage

What is Azure Policy in Microsoft Azure

MSI Motherboards with Intel Application Optimization

Boost Your Apps Now: Amazon ElastiCache Serverless Unveiled!

About Us

POPULAR CATEGORY