Amazon EC2 M7i Instances
Intel would like to demonstrate how the latest Amazon EC2 M7i Instances and M7i-flex instances with 4th Generation Intel Xeon Scalable processors can support your AI, ML, and DL workloads in this second of a three-blog series. They explained these new instances and their broad benefits in the first blog. They wanted to know how AI, ML, and DL workloads perform on these new instances and how Intel CPUs may help.
One research assesses the AI industry at $136.55 billion USD and predicts 37.3% yearly growth until 2030. While you could credit the increase to apparent AI usage like Google and Tesla’s self-driving cars, the advertising and media sector dominates the worldwide AI market. AI and ML/DL workloads are everywhere and growing. Cloud service providers (CSPs) like Amazon Web Services (AWS) are investing in AI/ML/DL services and infrastructure to enable organizations adopt these workloads more readily. Hosting instances with 4th-generation Intel Xeon Scalable CPUs and AI accelerators is one investment.
This article will explain how Intel CPUs and AWS instances are ideal for AI workloads. Two typical ML/DL model types will be used to demonstrate how these instances performed these workloads.
Amazon EC2 M7i &M7i Flex with 4th Gen Intel Xeon Scalables
As mentioned in the previous blog, Amazon EC2 provides M7i and M7i-flex with the newest Intel Xeon CPU. Primary difference: M7i-flex offers changeable performance at a reduced price. This blog will concentrate on regular Amazon EC2 M7i instances for sustained, compute-intensive applications like training or executing machine learning models. M7i instances have 2–192 vCPUs for various requirements. Each instance may accommodate up to 128 EBS disks, providing ample of storage for your dataset. The newest Intel Xeon processors include various built-in accelerators to boost task performance.
For better deep learning performance, all Amazon EC2 M7i instances include Intel Advanced Matrix Extensions (AMX) accelerator. Intel AMX lets customers code AI tasks on the AMX instruction set while keeping non-AI workloads on the CPU ISA. Intel has optimized its oneAPI Deep Neural Network Library (oneDNN) to make AMX easier to use for developers. Open-source AI frameworks like PyTorch, TensorFlow, and ONYX support this API. Intel tested 4th Gen Intel Xeon Scalable processors with AMX capabilities to give 10 times the inference performance of earlier CPUs.
Engineers and developers must adjust their AI, ML, and DL workloads on the newest Amazon EC2 M7i instances with Intel AMX to maximize performance. Intel offers an AI tuning guide to take use of Intel processor benefits across numerous popular models and frameworks. OS-level optimizations, PyTorch, TensorFlow, OpenVINO, and other optimizations are covered throughout the guide. The Intel Model Zoo GitHub site contains pre-trained AI, ML, and DL models pre-validated for Intel hardware, AI workload optimization guidance, best practices, and more.
After learning how Intel and the newest Intel Xeon processors may better AI, ML, and DL workloads, let’s see how these instances perform with object identification and natural language processing.
Models for detecting objects
Object detection models control image-classification applications. This category includes 3D medical scan, self-driving car camera, face recognition, and other models. They will discuss ResNet-50 and RetinaNet.
A 50-layer CNN powers ResNet-50, an image recognition deep learning model. User-trained models identify and categorize picture objects. ResNet-50 models on Intel Model Zoo and others train using ImageNet’s big picture collection. Most object identification models have one or two stages, with two-stage models being more accurate but slower. ResNet-50 and RetinaNet are single-stage models, although RetinaNet’s Focal Loss function improves accuracy without losing speed.
Performance how rapidly these models process photos depends on their use. End consumers don’t want lengthy waits for device recognition and unlocking. Before plant diseases and insect incursions damage crops, farmers must discover them immediately. Intel’s MLPerf RetinaNet model demonstrates that Amazon EC2 M7i instances analyze 4.11 times more samples per second than M6i instances.
As CPUs rise, ResNet-50 performance scales nicely, so you can retain high performance independent of dataset and instance size. An Amazon EC2 M7i instance with 192 vCPUs has eight times the ResNet-50 throughput of a 16vCPU instance. Higher-performing instances provide better value. Amazon EC2 M7i instances analyzed 4.49 times more samples per dollar than M6i instances in RetinaNet testing. These findings demonstrate that Amazon EC2 M7i instances with 4th Gen Intel Xeon Scalable CPUs are ideal for object identification deep learning tasks.
Natural Language Models
You’re probably using natural language processing engines when you ask a search engine or chatbot a query. NLP models learn real speech patterns to comprehend and interact with language. BERT machine learning models can interpret and contextualize text in addition to storing and presenting it. Word processing and phone messaging applications now forecast content based on what users have typed. Small firms benefit from chat boxes for first consumer contacts, even if they don’t run Google Search. These firms require a clear, fast, accurate chatbot.
Chatbots and other NLP model applications demand real-time execution, therefore speed is crucial. With Amazon EC2 M7i instances and 4th Generation Intel Xeon processors, NLP models like BERT and RoBERTa, an optimized BERT, perform better. One benchmark test found that Amazon EC2 M7i instances running RoBERTa analyzed 10.65 times more phrases per second than Graviton-based M7g instances with the same vCPU count. BERT testing with the MLPerf suite showed that throughput scaled well when they raised the vCPU count of Amazon EC2 M7i instances, with the 192-vCPU instance attaining almost 4 times the throughput of the 32-vCPU instance.
The Intel AMX accelerator in the 4th Gen Intel Xeon Scalable CPUs helps the Amazon EC2 M7i instances function well. Intel gives clients everything they need to improve NLP workloads with publicly accessible pre-optimized Intel processor models and tuning instructions for particular models like BERT. Amazon EC2 M7i instances outperformed M7g instances by 8.62 times per dollar, as RetinaNet showed.
For AI, ML, and DL, cloud decision-makers should use Amazon EC2 M7i instances with 4th Generation Intel Xeon Scalable CPUs. These instances include Intel AMX acceleration, tuning guidelines, and optimized models for many typical ML applications, delivering up to 10 times the throughput of Graviton-based M7g instances. Watch for further articles on how the newest Amazon EC2 M7i and M7i-flex instances may serve different workloads.