Intel AI Solutions Accelerate Microsoft Phi-4 Small Language Models
Intel is committed to investing in the AI ecosystem to ensure that its platforms are ready for the newest AI models and software available as part of the ongoing efforts to make AI ubiquitous. the newest family of small, open-source AI models developed by Microsoft, is now supported by intel AI solutions for AI PCs, edge devices, and datacenter platforms.
With today’s release, the Phi family has expanded. The Phi-4-mini model is lightweight and open. It has 3.8B parameters and is a dense decoder-only transformer type. The novel architecture features of Phi-4-mini, in contrast to Phi-3.5-mini, include grouped-query attention, shared input and output embedding, and a 200K vocabulary capacity. Phi-4-multi modal is a lightweight, open multi modal model with 5.6B parameters. It produces text after receiving input in the form of audio, graphics, and text. It comes with speech and audio encoders and adaptors, as well as the Phi-4-mini language model.
PCs and edge devices are at the forefront of offering AI experiences designed to assist users while preserving privacy and customisation. When using Arc discrete GPUs with Intel X Matrix Extensions (Intel XMX) acceleration or AI PCs with Intel Core Ultra processors that have an integrated neural processing unit (NPU) and Intel Arc GPU, Intel permits AI language models to run locally. Because of their tiny size, Microsoft Phi-4 models are perfect for on-device inference and allow for easy model modification or fine-tuning on AI PCs.
In bench marked the inference latency of the Microsoft Phi-4-mini variant on an AI PC equipped with an Intel Core Ultra 9 288V processor, an integrated Intel Arc GPU, and an Intel Arc B580 discrete GPU using the OpenVINO toolbox for performance optimization. OpenVINO speeds up AI inference while enabling higher throughput and accuracy. Microsoft Phi-4-mini-instructs on an Intel Core Ultra 9 288V processor with an integrated Intel Arc 140V GPU.
Throughput of Microsoft Phi-4-mini-instructs on a discrete GPU, the Intel Arc B580
The Phi family has a long history of producing text, but Phi-4-multimodal, a “multi-modality” AI model, offers intriguing new capabilities to handle text, graphics, and audio. The smooth functioning of Microsoft Phi-4-multi modal on an Intel Arc B580 discrete GPU is demonstrated in this presentation.
Example 1: Phi-4-multimodal running on Intel Arc B580 discrete GPU
Additionally, Intel’s data centre AI solutions, like as Intel Xeon processors and Intel Gaudi AI accelerators, fully support the latest Microsoft Phi-4 versions. The newly announced Intel Xeon 6 processor, which has been dubbed the best CPU for AI in the world, performs exceptionally well when used as a host CPU for workloads involving GPU acceleration, smaller generative AI models, and traditional machine learning.
Through early bench marking of Phi-4-mini and Phi-4-multi modal using PyTorch and Intel Extension for PyTorch on Intel Xeon 6 with MRDIMMs, It demonstrate that commercially accessible Xeon processors are a practical and effective option for inference deployment of SLMs like Phi-4.
Bench marked on an Intel Xeon 6 CPU with Performance cores 2S system with MRDIMMs, 1K input/1K output, and BF16 accuracy, the Microsoft Phi-4-mini can achieve a throughput of 1955 tokens/s. Assuming the standard input formats listed on the model card one prompt plus one image, one prompt plus one audio clip, and one image plus one audio clip and output tokens set to 1,000, an Intel Xeon 6 processor with Performance cores 2S system with MRDIMMs can produce 120 tokens/s while providing a 50ms next token latency SLA for Phi-4-multi modal.
Phi-4-multi modal throughput on an Intel Xeon 6 processor with performance cores 2S system with MRDIMMs
Both paradigms are now supported in Open Platform for Enterprise AI (OPEA) updates, which facilitate end users’ ability to build complete end-to-end applications on Gaudi and Xeon using a range of micro services.
Intel and Microsoft have a long history of collaborating on AI software for datacenters and clients. In conclusion, discrete GPUs, Xeon CPUs, Gaudi AI Accelerators, and Intel AI PCs all support Microsoft Phi-4 models at this time. Intel will continue to enhance all of its products’ AI functionality and experience.
Product and Performance Information
Intel Core Ultra
An Asus Zen book S 14 with 32GB of RAM operating at 8533MHz, the Intel graphics driver 101.6559, the openvino-GenAI 2025.1.0.dev20250225, and Windows 11 Pro 24H2 version 26100.2894 was used to measure the Intel Core Ultra 9 288V platform. Balanced power strategy, best performance power mode, and core isolation turned off. On February 25, 2025, Intel ran a test. The repositories are phi-4-mini and phi-4-multi modal.
Intel Arc B-Series
The Intel Arc B580 12GB graphics were measured using the following components: Samsung 990 Pro 2TB NVMe SSD, ASUS ROG MAXIMUS Z790 HERO motherboard, Intel Core i9-14900K, and 32GB (2x 16GB) DDR5 5600MHz. The software setups include the following: Windows 11 Pro 24H2 version 26100.2894, Intel graphics driver 101.6559, openvino-genai 2025.1.0.dev20250225, performance power policy, and core isolation disabled. On February 25, 2025, Intel ran a test. The repositories are phi-4-mini and phi-4-multi modal.
Intel Gaudi 3 AI Accelerator:
Testing was conducted using a single Intel Gaudi 3 AI Accelerator. Intel Xeon Platinum 8480+ CPU, 2 socket, 2.00 GHz. 1TB System Memory. Vault.habana.ai has the most recent version of the Intel Gaudi software package, which runs on Ubuntu 22.04 with Docker version 1.19.2. Gaudi-docker/1.19.2/ubuntu22.04/habanalabs/pytorch-installer-2.5.1. An extension of the best Habana transformers for the future (transformer v4.48). On February 25, 2025, Intel ran a test. The repository is called Dockerfile.