OneDNN Documentation: Optimized Deep Learning Primitives

By Drakshi

February 23, 2025

0

79

Intel oneAPI Deep Neural Network Library(OneDNN): Optimize the Deep Learning Framework’s CPU and GPU Performance.

oneDNN documentation

Create Deep Learning Frameworks and Applications More Quickly

Implementations of deep learning building blocks that are well optimized are offered by the Intel oneAPI Deep Neural Network Library (oneDNN). This open source, cross-platform toolkit abstracts out instruction sets and other performance optimization issues, allowing developers to utilize the same API for CPUs, GPUs, or both in deep learning applications and frameworks.

With this library, you are able to:

Boost the functionality of the frameworks you now use, like PyTorch, TensorFlow, Intel AI Tools, and the OpenVINO toolkit.
Use optimised building components to create deep learning frameworks and applications more quickly.
Install programs that are tailored for Intel CPUs and GPUs without having to write code specifically for each target.

Download the Stand-Alone Version

OneDNN can be downloaded independently. You can select your favorite repository or get binaries from Intel.

Aid in the Evolution of oneDNN

The Unified Acceleration (UXL) Foundation’s oneAPI specification is implemented by oneDNN. You are welcome to take part.

Features

Automated Optimization

Make use of the deep learning frameworks that are already in place.
Create and implement deep learning applications that are platform-independent and include automatic instruction set architecture (ISA) recognition and ISA-specific optimization.

Network Optimization

Determine performance snags with Intel VTune Profiler.
Make use of convolutional parameters and hardware to automatically pick and propagate memory formats.
Combine primitives with operations performed on their output, such as Conv+ReLU.
Quantize primitives utilising Intel Neural Compressor-Optimized Implementations of essential Building Blocks from FP32 to FP16, bf16, or int8.

Optimized Implementations of Key Building Blocks

Convolution
Multiplication of matrices
Combining
Normalization of batches
Functions of activation
Cells in recurrent neural networks (RNNs)
Cells with long short-term memory (LSTM)

Abstract Programming Model

Primitive: Any low-level process, like convolution, data format reordering, and memory, that serves as the foundation for more complicated operations.
Memory: Manages the memory allotted to a certain engine, data type, memory format, and tensor dimensions.
Engine: A CPU or GPU is an example of a hardware processing unit.
Stream: An engine’s queue of basic operations.

Examples of Cases

Netflix Doubles the Performance of Video Encoding

To make the most of the Intel Advanced Vector Extensions 512 (Intel AVX-512) instruction set, Netflix researchers used oneDNN to optimise cloud-based video encoding.

IBM Research Increases Watson’s NLP Capabilities by 165%

On third-generation Intel Xeon Scalable processors, oneDNN optimisations increased text and sentiment classification task performance by up to 35%. A 165% gain was achieved by switching to fourth-generation Intel Xeon processors with oneDNN.

Demonstrations

Use a oneDNN graph to speed up inference on x86-64 machines

PyTorch 2.0 and later versions support the oneDNN Graph, which can speed up inference on x86-64 CPUs using float32 and bfloat16 datatypes.

Enhance Intel Processor Transformer Model Inference

Utilizing optimized oneDNN operations through the fusion of TensorFlow processes lowers inference time and memory consumption.

News

Intel’s 2024 Tools Are Now Accessible with UXL Foundation

OneDNN’s 2024.1 version includes new features that increase development productivity, optimise performance on Intel Xeon processors, and streamline storage efficiency.

CPU and GPU Performance Enhancements Added in oneDNN 3.4

Improved MATMUL performance for big language models and transformer-style models on Intel CPUs and GPUs are among the performance enhancements aimed at new and upcoming device support.

Specifications

Category	Specifications
Processors	– Intel Atom processors with Intel Streaming SIMD Extensions – Intel Core processors – Intel Xeon processors
GPUs	– Intel Processor Graphics Gen9 and above – Iris graphics – Intel Data Center GPUs – Intel Arc A-series graphics
Host & Target Operating Systems	– Linux – Windows – macOS
Languages	– SYCL (Requires Intel oneAPI Base Toolkit) – C and C++
Compilers	– Intel oneAPI DPC++/C++ Compiler – Clang – GNU C++ Compiler – Microsoft Visual Studio – LLVM for Apple
Threading Runtimes	– Intel oneAPI Threading Building Blocks – OpenMP – SYCL

OneDNN Documentation: Optimized Deep Learning Primitives

oneDNN documentation

Create Deep Learning Frameworks and Applications More Quickly

Download the Stand-Alone Version

Aid in the Evolution of oneDNN

Features

Automated Optimization

Network Optimization

Optimized Implementations of Key Building Blocks

Abstract Programming Model

Examples of Cases

Netflix Doubles the Performance of Video Encoding

IBM Research Increases Watson’s NLP Capabilities by 165%

Demonstrations

Use a oneDNN graph to speed up inference on x86-64 machines

Enhance Intel Processor Transformer Model Inference

News

Intel’s 2024 Tools Are Now Accessible with UXL Foundation

CPU and GPU Performance Enhancements Added in oneDNN 3.4

Specifications

Distribution Vectors Fine-tune Models For Better Performance

NVIDIA RTX Kit: Neural Rendering And AI Ray Tracing Tools

NVIDIA DGX Cloud Pricing, Benefits, And Features Explained

LEAVE A REPLY Cancel reply

Recent Posts

IBM And Intel Partnership To Boost Industry AI With Gaudi 3

Ubuntu 25.04: The Best New Features You Need To See

INNO3D GEFORCE RTX 5070 TWIN X2 OC Graphics Card

Distribution Vectors Fine-tune Models For Better Performance

NVIDIA RTX Kit: Neural Rendering And AI Ray Tracing Tools

Explore Snapdragon G Series: Powering Handheld Gaming

Popular Post

ASRock’s creative AMD FP6 series thin mini-ITX motherboard

ASUS ProArt PA602 The Most Elegant Computer Case!

Boost Your Apps Now: Amazon ElastiCache Serverless Unveiled!

What is Azure Policy in Microsoft Azure

The Ultimate Showdown: Redmi Watch 3 vs Redmi Watch 4!

Cardea Z540 SSD Revolutionizes Storage

About Us

POPULAR CATEGORY