Intel AI Inference Performance
The results of MLCommons’ MLPerf Inference v3.1 performance benchmark have been made public. The benchmark was run on GPT-J, a huge language model with 6 billion parameters, as well as computer vision and natural language processing models. Results were provided by Intel for the Habana Gaudi2 accelerators, the 4th Generation Intel Xeon Scalable CPUs, and the Intel Xeon CPU Max Series. The findings demonstrate Intel’s competitive performance for AI inference and confirm the company’s commitment to make artificial intelligence more accessible at scale across the whole continuum of AI workloads, from the client and edge to the network and the cloud. This commitment was reinforced by the findings.
Why It Is Important: In addition to the MLCommons AI training update from June and the Hugging Face performance benchmarks, which validated that Gaudi2 can outperform Nvidia’s H100 on a state-of-the-art vision language model, the results from today further reinforce the notion that Intel offers the only viable alternative to Nvidia’s H100 and A100 for AI compute needs.
Intel is bringing artificial intelligence (AI) to everywhere with technologies that can manage inference and training throughout the continuum of artificial intelligence workloads. Every client has unique requirements, and Intel is delivering AI everywhere. Intel’s solutions for artificial intelligence provide clients with freedom and choice when it comes to selecting an appropriate AI solution based on their own specific performance, efficiency, and cost requirements, while also assisting customers in breaking out of closed ecosystems.
Concerning the Habana Gaudi2 Evaluations: The strong confirmation of GPT-J’s competitive performance that is provided by the Habana Gaudi2 inference performance data is provided.
Gaudi2’s inference performance on GPT-J-99 and GPT-J-99.9 is 78.58 per second for server queries and 84.08 per second for offline samples, respectively.
When compared to Nvidia’s H100, Gaudi2 offers impressive performance; yet, H100 shows a little edge in performance, with 1.09 times the server performance and 1.28 times the offline performance of Gaudi2.
Gaudi2 is 2.4 times faster than Nvidia’s A100 when running on a server and is twice as fast while running offline.
The Gaudi2 submission used FP8 and achieved an accuracy of 99.9% on this newly introduced data format.
About the Intel Xeon Results:Intel anticipates that it will be able to continue offering performance enhancements and extended model coverage in MLPerf benchmarks with the help of Gaudi2 software upgrades that are issued every six to eight weeks.
Regarding the Results Obtained by Intel Xeon, Intel used its 4th Generation Intel Xeon Scalable processors in all seven of the inference benchmarks that were run, including GPT-J. These findings demonstrate excellent performance for AI workloads that serve broad purposes. These models include vision, language processing, voice and audio translation, in addition to the considerably bigger DLRM v2 recommendation and ChatGPT-J models. In addition, Intel is the only company that has presented public CPU results using deep learning ecosystem software that is considered to be the industry standard.
Building and deploying general-purpose AI workloads using the most common AI frameworks and libraries is well suited for the 4th Generation Intel Xeon Scalable CPU, which is suitable for this purpose. In the GPT-J job of summarizing 100 words from a news story ranging from around 1,000 to 1,500 words, 4th Generation Intel Xeon processors were able to summarize two paragraphs per second while operating in offline mode and just one paragraph per second when operating in real-time server mode.
Intel has just recently provided MLPerf findings for the Intel Xeon CPU Max Series, which offers up to 64 gigabytes (GB) of high-bandwidth memory. This is the first time that Intel has done so. It was the only CPU that could reach 99.9% accuracy for the GPT-J benchmark, which is essential for applications in which the utmost precision is of fundamental importance to speed.
Intel worked in collaboration with its original equipment manufacturer (OEM) customers to offer their own submissions. This further demonstrated the scalability of AI performance as well as the broad availability of general-purpose servers powered by Intel Xeon processors that are able to satisfy customer service level agreements (SLAs).
What’s Next: MLPerf, which is widely considered to be the most trustworthy benchmark for AI performance, allows fair and consistent performance comparisons. Intel plans to provide updated data regarding the performance of AI training when the next MLPerf test is conducted. The continuous enhancements to performance demonstrate Intel’s dedication to supporting clients and addressing every node along the AI continuum, from low-cost AI CPUs to the highest-performing AI hardware accelerators and GPUs for corporate, network, and cloud customers respectively.
[…] data is king and AI is the game-changer for our clients. The collaboration between IBM and AWS stands out as an […]
[…] infrastructure in the coming years as the demand for online content, cloud services, and AI applications continues to rise. This is an exciting new route for cable capacity scaling, which will pave the […]