Large language models and generative AI are among the enterprise-scale AI capabilities made possible by the new IBM Telum II Processor and IBM Spyre Accelerator.
To minimize energy usage and data center footprint, a scalable I/O sub-system is made possible and easier to use by advanced I/O technology.
At Hot Chips 2024, IBM disclosed the architecture for the future IBM Spyre Accelerator and IBM Telum II Processor. Through a unique ensemble approach to AI, the new technologies are intended to dramatically extend processing power across the next generation of IBM Z mainframe systems, accelerating the application of both traditional and Large Language AI models simultaneously.
Large Language Models (LLMs) are being used in a growing number of generative AI projects, and as these projects go from proof-of-concept to production, there is an increasing need for scalable, secure, and power-efficient solutions. According to August study from Morgan Stanley, over the next few years, generative AI is expected to increase its power consumption by 75% yearly, making it on track to consume as much energy in 2026 as Spain did in 2022. A growing number of IBM clients have indicated that architectural choices supporting foundation models of the right scale and hybrid-by-design methods for AI workloads are crucial.
Among the major breakthroughs revealed today are:
- Telum II Processor from IBM: The new IBM semiconductor, which is intended to power IBM Z systems of the future, has a coherently attached Data Processing Unit (DPU) and is equipped with a 40 percent larger cache, higher frequency, and an integrated AI accelerator core than the first-generation Telum chip. It is anticipated that the new processor will support enterprise compute solutions for LLMs, meeting the complicated transaction needs of the sector.
- IO acceleration unit: The Telum II processor chip features a brand-new Data Processing Unit (DPU) designed to speed up complicated IO protocols for mainframe networking and storage. The DPU can enhance the performance of important components and streamline system operations.
- IBM Spyre Accelerator: To supplement the Telum II CPU, the IBM Spyre Accelerator offers more AI compute power. Combining several machine learning or deep learning AI models with encoder LLMs is known as ensemble modeling, and it is made possible by the Telum II and Spyre chips working together to create a scalable architecture. When compared to individual models, ensemble AI may yield more reliable and accurate results by utilizing the advantages of each model architecture. The Hot Chips 2024 conference featured a glimpse of the IBM Spyre Accelerator chip, which will be available as an add-on. Each accelerator chip is based on technology developed in partnership with IBM Research and is connected via a 75-watt PCIe adaptor. The Spyre Accelerator is scalable to meet customer needs, just as other PCIe cards.
Enterprise computing solutions that are high-performance, secure, and more energy-efficient are the goals of the Telum II Processor and Spyre Accelerator. These advancements, which have been in development for years, will be included in IBM’s upcoming IBM Z platform, enabling clients to use generative AI and LLMs at scale.”
Built on its high performance, low power consumption 5nm manufacturing node, Samsung Foundry, IBM’s longstanding fabrication partner, will produce the IBM Spyre Accelerator and the Telum II CPU. Together, they will provide a variety of cutting-edge AI-driven application cases intended to generate new competitive advantages and unlock commercial value. Clients can obtain quicker, more accurate forecasts when using AI ensemble approaches. The deployment of generative AI use cases will be made easier by the combined processing capability that was revealed today. Among the possible examples are:
- Insurance Claims Fraud Detection: Using ensemble AI, which combines LLMs with conventional neural networks designed for increased performance and accuracy, fraud detection in home insurance claims is improved.
- Advanced Anti-Money Laundering: Enhanced risk mitigation for financial crimes, enhanced detection for questionable financial activity, and help for regulatory compliance.
- AI assistants: accelerating the lifespan of applications, sharing knowledge and skills, transforming and explaining code, and more.
Details and Performance Measures:
Telum II processor: The Telum II CPU has eight high-performance cores that operate at 5.5 GHz. Each core has 36MB of L2 cache, and the on-chip cache capacity has been increased by 40% to 360MB. This version offers a 40% boost over the previous one because to the virtual level-4 cache of 2.88GB per CPU drawer. The inbuilt AI accelerator offers four times more computational capability per chip than the previous generation and enables low-latency, high-throughput in-transaction AI inferencing, for example improving fraud detection during financial transactions.
The Telum II chip incorporates the new I/O Acceleration Unit DPU. Its 50% higher I/O density is intended to enhance data handling. With this development, IBM Z’s overall efficiency and scalability are improved, which makes it more suitable for managing the massive AI workloads and data-intensive applications that modern enterprises require.
Spyre Accelerator: An enterprise-grade accelerator made especially to handle complex AI models and generative AI use cases is the Spyre Accelerator. It is being demonstrated. It has up to 1TB of memory that can support AI model workloads on the mainframe and is meant to use no more than 75W per card. The memory is built to work in tandem across the eight cards of a standard IO drawer. With 32 compute cores per chip, low-latency and high-throughput AI applications can use the int4, int8, fp8, and fp16 datatypes.
Accessible
The IBM Z and IBM LinuxONE platforms of the upcoming generation will be powered by the Telum II CPU. It is anticipated that clients of LinuxONE and IBM Z would have access to it by 2025. It is also anticipated that the IBM Spyre Accelerator, which is presently under tech preview, will be accessible in 2025.
Any statements about IBM’s intentions or future course are only goals and objectives, and they are subject to modification or retraction at any time.