4th Gen AMD EPYC Processors Are Ideal for AI Workloads.
What Are AI Workloads?
AI has moved from a hypothetical concept to an everyday part of their lives, affecting everything from personalized streaming suggestions to enhanced healthcare diagnostics.
Its broad usage is revolutionizing industries, increasing productivity, and improving user experiences. AI simplifies hard activities and changes how they live, work, and use technology.
AI/ML Workloads
Key corporate AI applications include
- AI and machine learning analyze user behaviour and preferences to recommend suitable items, services, and content for e-commerce, streaming, and social media to increase engagement.
- AI chatbots and virtual assistants serve customers 24/7. More satisfied customers, faster replies, and lower operating costs.
- Predictive Maintenance: AI and machine learning examine mechanical system data in real time to detect irregularities and predict unscheduled maintenance before failure. This extends equipment life, saves downtime, and cuts expenses.
- Digital twins use this notion extensively.
- AI systems analyze transactions for strange patterns or abnormalities that may suggest fraud. This improves security, financial asset protection, and fraud resilience.
- AI creates targeted marketing campaigns and personalized suggestions using consumer data. This method boosts engagement, conversions, and marketing dollars.
- Supply Chain Optimization: AI estimates demand, optimizes inventory, and simplifies complicated global logistics and supply chains. This optimizes product delivery, lowers operating costs, and increases capital and inventory efficiency.
- Large Language Models (LLMs) extract information from unstructured text documents like contracts and reports. This streamlines data extraction, automates document analysis, and informs decision-making with substantial text data.
- The infrastructure required for AI-based business augmentations remains unclear. This blog compares AMD EPYC CPUs against their rivals for important AI tasks, utilising evidence-based comparisons.
AI Inference Workloads
AI, AMD
AMD is well-positioned with workload-optimized compute engines and technologies that underpin efficient AI systems of any scale. The new AMD Ryzen 7040 Series Processors provide the first AI engine in an x86 CPU for consumer and business PCs. Many industries, including NASA’s Mars rovers, use AMD Alveo accelerators, Versal adaptive SoCs, and leading FPGAs for AI-based image identification and enhanced automotive driver-assist and safety capabilities. The AMD Instinct MI300X GPU uses 153 billion transistors for next-generation AI computation, enabling real-time generative AI acceleration at huge scales.
AMD EPYC processors have hundreds of world records for performance and efficiency, proving their ability to power corporate workloads like current data centre AI applications. They are perfect for enterprises that want to incorporate AI into many applications while keeping a uniform x86-based infrastructure for databases, big data, natural language processing, chatbots, and other AI services.
Customers with greater and more demanding training and inference requirements may use AMD Instinct accelerators or other PCIe-based GPUs. However, a CPU alone will perform well in many mainstream and first AI deployment situations. The best x86 platform for AI workloads is 4th Gen AMD EPYC CPUs.
End-to-End AI
The multiple use cases above show that “AI” implementations may take many shapes and include many processes. AI pipeline training and inference demand a lot of computational power to clean and convert data, prepare and label it, score and serve it, and turn outcomes into usable business insights. Testing tools assist characterize performance due to the variety of implementations, methods, and architectural possibilities.
The TPCx-AI benchmark assesses the complete AI pipeline. It has a full retail data centre dataset structure including customer, order, financial, and product data. It covers corporate use cases such customer segmentation, conversation transcription, sales forecasting, spam detection, pricing prediction, categorization, and fraud detection. This benchmark’s validated and published results reveal performance and efficiency expectations.
Intel Xeon Platinum 8592+
AMD examined inference performance on end-to-end AI workloads that simulate real-world AI and data science applications and include loading and model training on a heterogeneous 30 GB dataset (Scale Factor 30). A 2P system with 96-core AMD EPYC 9654 processors and six 32-core instances outperformed a 64-core Intel Xeon Platinum 8592+ system with four 32-core instances by ~1.65X. The AMD EPYC 9654 system outperforms the Intel Xeon Platinum 8592+ system by ~1.10x per-core.
Gradient Boosting
Gradient boosting is used for regression and classification. XGBoost (eXtreme Gradient Boosting), a popular and efficient open-source gradient boosting technique, handles big datasets and offers parallel processing for fast training. Processing partial real-world data without pre-processing is possible because it gracefully accommodates missing values. Performance and adaptability make XGBoost a suitable option for many applications, as seen in the repository’s use case models and data sets. These examples provide good test parameters.
A 2P AMD EPYC 9654 system outperformed a 2P Intel Xeon Platinum 8592+ system with a ~1.38x aeroplane inference uplift in 16 concurrent instances with 12 cores/instance.
Xeon Platinum 8592+
Similarity Search
Similarity search is a basic data mining and information retrieval operation that finds things most like a query object. The purpose is to find dataset objects that resemble a reference item using a similarity metric. The Facebook AI Similarity Search (FAISS) package allows rapid, scalable multimedia file searches. It allows ideal memory-speed-accuracy k-means nearest-neighbor (KNN) searches over huge datasets, surpassing historical databases.
AMD evaluated Faiss.Index PQ.ST_PQ (Product Quantization with Subspace Tree) on the sift1m SIFT image descriptor dataset. A 2P AMD EPYC 9654 system produced a ~2.04x inference uplift with 8 parallel instances of 24 cores, compared to a 2P Intel Xeon Platinum 8592+ system with 8 parallel instances of 16 cores.
Multitask Learning
MTL teaches a model to do many tasks simultaneously. MTL enhances performance by using shared information instead of training individual models for each task. The sophisticated neural network architecture Multi-gate Mixture-of-Experts (MMoE) learns numerous related tasks in MTL situations. To handle various and interrelated activities, MMoE adds numerous gating mechanisms to mixture-of-experts models.
AMD examined 8M Taobao records. Two AMD EPYC 9654 systems with 12 parallel instances of 16 cores/instance outperformed a 2P Intel Xeon Platinum system with 8 parallel instances of 16 cores/instance by ~1.45x.
Many Decision Trees
Machine learning method Random Forest trains decision trees on a random sample of data and characteristics. Combining their predictions improves accuracy and generalization for classification and regression tasks. Its capacity to handle big datasets, minimize overfitting, and reveal feature relevance makes it popular in finance, healthcare, and other fields.
AMD predicted flight delays using random forest testing on a 1M-row airline dataset. The 2P AMD EPYC 9654 system saw a ~1.36x inference uplift with 12 concurrent instances and 16 cores/instance, compared to the 2P Intel Xeon Platinum 8592+ system with 8 parallel instances and 16 cores/instance. This is due to the high core counts and high-performance “Zen 4” cores in 4th Gen AMD EP Broad Spectrum AI Workloads Performance Leadership.
Large Language Models
Large Language Models (LLMs) are evolving quickly, stimulating fast adoption for in-house chatbots, summarization, and extracting information from unstructured text documents like contracts and reports. Smaller, enterprise-class models (10-13 billion parameters) and inference jobs benefit from 4th Gen AMD EPYC processors’ performance and cost. Customers with bigger models and real-time training may use AMD Instinct MI300 series or other 3rd party AI accelerators. The Meta “Llama” family of models is a prominent LLM example.
AMD tested Llama2-7B and Llama3-7B on 2P AMD EPYC 9654 and Intel Xeon Platinum 8592+ systems. Compared to the 2P Intel Xeon Platinum 8592+ system, the 2P AMD EPYC 9654 system has a ~1.21x inference uplift with 8 parallel instances of 24 cores/instance.
For details, read Leadership Natural Language AI Performance: Outperforming 5th Gen Intel Xeon with AMX.
Suggestion Engine
The Deep Learning Recommendation Model (DLRM) integrates user and item data to improve recommendation systems and provide exact personalized suggestions. This demo shows DLRM on Amazon EC2.
Amazon EC2 HPC7a.96xlarge instances with 4th Gen AMD EPYC processors offer ~1.44x performance, ~1.93x performance/$, and ~1.48x cloud OpEx savings over Amazon m7i. 48xlarge instances with Intel Xeon “Sapphire Rapids” processors on Deep Learning Recommendation Engine (DLRMv2) at Int8 precision.
Conclusion
AI advances are changing life and business. 4th Gen AMD EPYC processors can handle today’s AI workloads alone or as GPU hosts due to their high core density, memory bandwidth, and efficiency. Anyone that wants to take advantage of the AI revolution and more complicated workloads now and in the future must choose hardware. This blog showed how much faster bare metal 4th Gen AMD EPYC processors are than 5th Gen Intel Xeon Platinum CPUs. The AMD EPYC advantage for cloud-based AI was also emphasized.
AMD collaborates with a large network of Independent applications Vendors (ISVs) to optimize their applications for the AMD EPYC platform to provide clients with “out of the box” performance and efficiency with excellent results. The AMD Documentation Hub, which includes EPYC Tuning Guides, provides thorough CPU tuning advice for important workloads to maximize customer potential.