More than 1,000 small- and medium-sized game production companies use MetaApp’s game creation and distribution services, making it the top interactive entertainment platform in China. With a 34 percent market share, Alibaba Cloud is China’s leading cloud service provider (CSP), and it powers the company’s services.
MetaApp offers tools for game development as well as an AI-powered recommendation system. The recommendation method aids in driving more end users to games, which enhances their ability to be monetised. The business worked with Alibaba Cloud and Intel to develop a better recommendation system, which is essential to its expansion plan.
Using an Alibaba Cloud Elastic Compute Service (ECS) c8i instance, MetaApp developed the system. The business harnessed the power of the underlying CPU, the 4th Gen Intel Xeon Scalable processors, including the integrated Intel Advanced Vector Extensions 512 (Intel AVX-512) accelerator, by using DeepRec, an open source deep learning (DL) framework improved by Intel oneAPI Deep Neural Network Library (Intel oneDNN).
Difficulties
In addition to requiring a large quantity of memory to support embedding subsystems, AI recommendation systems require a great deal of compute power for DL networks. The first step of the recommendation system’s process is to reduce the millions of items on the list to only hundreds. “The candidates” refers to the list. Next, using insights about entity relationships stored in embedded subsystems and DL-based scoring methods, each item is graded and scored. Lastly, the method uses DL network insights to refine the ranking, which enables it to take complicated limitations into account.
A recommendation system’s overall latency is determined by adding the times for candidate generation, ranking, and any necessary re-ranking. Most recommendation systems actually strive for sub-second to low-second latencies throughout the recommended process.
The cost and potential impact on profitability are the two requirements: significant memory and high processing power. The majority of the data still needs to be handled on the CPU side due to memory requirements, even though GPUs can partially solve the computing power issue.
With a fresh look, MetaApp saw multiple chances to improve its recommendation system. They intended to increase the elasticity and use of resources. They also intended to make the design simpler. In order to lower system TCO, they finally looked for methods to lessen or do away with the need for GPUs, which are often more expensive in cloud instances.
Answer
On the Alibaba Cloud ECS c8i instance family with 4th Gen Intel Xeon Scalable CPUs, MetaApp implemented a new recommendation system with assistance from Intel and Alibaba Cloud. From an instance running on 2nd Gen Intel Xeon Scalable processors, MetaApp migrated its AI training (fine-tuning) workload to the c8i instance. From an instance running on 3rd generation Intel Xeon Scalable processors, it migrated its AI inference workload to the c8i instance.
The integrated accelerators in 4th generation Intel Xeon Scalable processors help to increase performance and workload efficiency, particularly for applications utilising AI. Additionally, they have technologies that boost performance, such as PCIe Gen 5 that enable faster input/output (I/O) and DDR5 memory that has 1.5 times the bandwidth of DDR4 memory present in earlier processor generations.
DeepRec is a high-performance deep learning recommendation system based on TensorFlow that MetaApp has embraced. Additionally, the business made use of oneDNN, an open-source cross-platform performance acceleration package that was incorporated into DeepRec. Using oneDNN, MetaApp was able to leverage the power of the Intel AVX-512, a built-in accelerator found in 4th Gen Intel Xeon Scalable processors, to boost computational performance.
Result
The organisation was able to accomplish its goals for flexibility, cost, and speed thanks to MetaApp’s new recommendation system.
Dynamic Resourcing is Provided by DeepRec
By combining the DeepRec framework and oneDNN, MetaApp’s developers were able to create AI applications quickly and independently of platforms by utilising well-optimized building pieces, saving time and money on development.
MetaApp collaborated with Alibaba Cloud and Intel to optimise DeepRec at different recommendation system levels for 4th Gen Intel Xeon Scalable CPUs. Through these improvements, MetaApp is now able to meet all of its requirements, including online inference and offline training.
MetaApp has been able to eliminate its need on GPUs by utilising DeepRec with Intel optimisations. The business can now achieve flexible scalability in Alibaba Cloud and dynamic resource adjustment.
Lower Cost, Fewer Cores
Based on Alibaba Cloud ECS c8i instances with 4th Gen Intel Xeon Scalable processors, MetaApp’s recommendation system was able to do AI inference at a 22 percent lower cost. cost and with 25% less cores than its previous system at the same query per second (QPS).
Improved performance during AI training
MetaApp achieved 64% greater training (fine-tuning) performance compared to 2nd Gen Intel Xeon Scalable processor instances when utilising Alibaba Cloud ECS c8i instances. Improved training performance makes it possible for the recommendation system to comprehend user preferences more quickly.
In comparison to instances operating on 2nd Gen Intel Xeon Scalable processors, MetaApp additionally benefited from 160 percent higher training (fine-tuning) performance per price. Low TCO is influenced by higher performance per price.
Efficiency in Energy
Alibaba Cloud gains additional advantages from its c8i instances in addition to alleviating cloud TCO for clients such as MetaApp. When comparing MetaApp’s average performance-per-watt efficiency to earlier processor generations, Intel innovations enable a 2.9x gain. Alibaba Cloud can now achieve cost savings and sustainability objectives thanks to this improved energy efficiency.
Acceleration of AI
Hardware-based AI acceleration has long been offered by Intel. The 2nd generation Intel Xeon Scalable processors brought Intel Deep Learning Boost (Intel DL Boost), a collection of acceleration capabilities to speed up AI training and inference.
Vector Neural Network Instructions (VNNI), which employs Intel AVX-512 to lower the amount of operations per clock cycle, is one of these characteristics. Additionally, the INT8 data format is natively supported by 2nd Gen Intel Xeon Scalable processors, which can increase inference efficiency and speed.
Third-generation Intel Xeon Scalable processors feature bloat16 (BF16), which may improve AI training. With integrated accelerators like Intel AMX, 4th-generation Intel Xeon Scalable processors outperform their predecessors. The performance of DL inference and training is enhanced with Intel AMX. Instead of shifting the burden to separate accelerators, this accelerator lets you do AI inference on the CPU, which can result in a noticeable speed increase. Eleven The data types BF16 and INT8 are also supported by the Intel AMX architecture.
AI Success
To enable its game-developer clients to make more money from their games, MetaApp collaborated with Alibaba Cloud and Intel to create a brand-new AI-based recommendation system. The system costs less and operates faster. It outperforms the prior system by 64% and yields 160% more training performance per price for AI training (fine-tuning).9 Compared to the previous system, AI inferencing employs 25% fewer cores at a 22% lower cost.8. Furthermore, dynamic scheduling and flexible scalability are made possible without the requirement for GPUs via software optimizations for Intel technology.
The path of MetaApp is not exclusive. Personalised services are essential for today’s digitally native businesses to develop and prosper, from social media platforms to e-commerce. AI-driven recommendation engines are essential. Alibaba Cloud and Intel provide technologies and services to assist clients like MetaApp create flexible, quick, efficient, and affordable recommendation systems.