Intel oneDNN
Boost the Performance of the Deep Learning Framework on Intel Processors Overview. Deep learning, a complex machine learning framework, processes unstructured data like text, video, and images. Deep learning frameworks simplify large-scale data gathering, analysis, and interpretation for data scientists and developers.
Intel provides the Intel oneAPI Deep Neural Network Library (oneDNN) to enhance the performance of deep learning frameworks and accelerate the development of applications across many hardware architectures.
Advantages
OneDNN is a library of performance functions that offers highly optimized versions of the building blocks for deep learning frameworks and applications. The cross-platform, open-source library facilitates the use of the same API for CPUs, GPUs, or both by data scientists and developers. The benefits include:
- Boost the efficiency of the frameworks you now use, like PyTorch, TensorFlow, Intel AI Tools, and the OpenVINO toolkit.
- Utilize optimized building components to create deep learning frameworks and applications more quickly.
- Without creating any target-specific code, implement AI applications that are optimised for a variety of hardware architectures, including Intel CPUs and GPUs.
- PyTorch
- TensorFlow
- AI Resources
- Toolkit for OpenVINO
oneDNN Documentation
Create Deep Learning Frameworks and Applications More Quickly
Deep learning building blocks can be found in highly optimized implementations in the Intel oneAPI Deep Neural Network Library (oneDNN). This cross-platform, open-source toolkit lets deep learning application and framework developers utilize the same API for CPUs, GPUs, or both, simplifying performance optimization.
oneDNN Tensorflow
- Boost the efficiency of the frameworks you now use, like O
- PenVINO toolset, PyTorch, TensorFlow, and Intel AI Tools.
- Utilizing optimized building components, create deep learning applications and frameworks more quickly.
- Install programs designed to run on Intel CPUs and GPUs without having to write code tailored to the platform.
As a Part of the Toolkit, download: The Intel oneAPI Base Toolkit, a fundamental collection of tools and libraries for creating high-performance, data-centric applications on a variety of architectures, includes oneDNN.
Utilize the Intel Tiber Developer Cloud to develop: Using the most recent Intel-optimized oneAPI and AI tools, create and optimize oneAPI multiarchitecture apps, test your workloads on Intel CPUs and GPUs. No software downloads, configurations, or hardware setups are required.
Get the Stand-Alone Version Here: OneDNN can be downloaded independently. You can select your favourite repository or download binaries directly from Intel.
Aid in the Evolution of oneDNN: The Unified Acceleration (UXL) Foundation’s oneDNN is an implementation of the oneAPI specification.

Features
Automated Enhancement
Utilize the deep learning frameworks that are already available.
Create and implement deep learning apps that are platform-neutral and incorporate ISA-specific optimization in addition to automatic instruction set architecture (ISA) recognition.
Network Enhancement
- Find performance snags with Intel VTune Profiler.
- Utilize hardware- and convolutional parameter-based autonomous memory format selection and propagation.
- Combine primitives with operations that are performed on the outcome of the primitive, for as Conv+ReLU.
- Using Intel Neural Compressor, quantize primitives from FP32 to FP16, bf16, or int8.
Efficient Use of Essential Building Elements
- Convolution
- Multiplication of matrices
- Combining
- Normalization of batches
- Functions of activation
- Cells of the recurrent neural network (RNN)
- Cells with long short-term memory (LSTM)
Model of Abstract Programming
Primitive: Any low-level process, such convolution, data format reordering, and memory, from which more sophisticated operations are built.
Memory: Manages memory allotted to a certain engine, data type, tensor dimension, and memory format.
Engine: A CPU or GPU is an example of a hardware processing unit.
Stream: An engine’s queue of simple operations
Memory objects: Manage the data type, memory format, tensor dimensions, and memory allotted to a particular engine.
oneDNN Pytorch
Using a single DNN graph, accelerate inference on x86-64 machines
PyTorch 2.0 and later versions provide the oneDNN Graph, which can speed up inference on x86-64 CPUs using float32 and bfloat16 datatypes.
Transformer Model Inference Optimization for Intel Processors
Utilizing combined TensorFlow and optimized oneDNN operations minimizes memory use and inference delay.
Updates
Intel’s 2024 Tools Are Now Available through UXL Foundation
OneDNN’s 2024.1 version brings new capabilities to enhance development productivity, optimize performance on Intel Xeon processors, and streamline storage efficiency.
oneDNN 3.4 Introduces CPU and GPU Performance Enhancements
Enhancements in performance are directed towards forthcoming and new device support. This includes better MATMUL performance on Intel CPUs and GPUs for big language models and transformer-style models.
Setting up
The oneDNN software’s binary distribution can be installed in the following ways:
- Included in the Base Toolkit for Intel oneAPI
- From the Anaconda
- In a standalone form
If you are unable to get the necessary configuration, you can create a oneDNN library directly from the source. This library is designed to improve the speed of deep learning frameworks like PyTorch and TensorFlow and is optimized for usage with Intel architecture CPUs and Intel Processor Graphics. For further information about CPU and GPU runtimes, see the build options and system requirements page.
Example Code
They illustrate the fundamentals of the oneDNN programming model in this C++ code sample:
- Generating oneDNN primitives and memory objects.
- Directing the primitives.
- Example C++ Code
Making a getting_started_tutorial() method that includes all the steps required to develop a oneDNN programming model is the initial step. The main() function then calls this function in turn.