Sunday, March 30, 2025

OneDNN Documentation: Optimized Deep Learning Primitives

Intel oneAPI Deep Neural Network Library(OneDNN): Optimize the Deep Learning Framework’s CPU and GPU Performance.

oneDNN documentation

Create Deep Learning Frameworks and Applications More Quickly

Implementations of deep learning building blocks that are well optimized are offered by the Intel oneAPI Deep Neural Network Library (oneDNN). This open source, cross-platform toolkit abstracts out instruction sets and other performance optimization issues, allowing developers to utilize the same API for CPUs, GPUs, or both in deep learning applications and frameworks.

Create Deep Learning Frameworks and Applications More Quickly
Image Credit To Intel

With this library, you are able to:

  • Boost the functionality of the frameworks you now use, like PyTorch, TensorFlow, Intel AI Tools, and the OpenVINO toolkit.
  • Use optimised building components to create deep learning frameworks and applications more quickly.
  • Install programs that are tailored for Intel CPUs and GPUs without having to write code specifically for each target.

Download the Stand-Alone Version

OneDNN can be downloaded independently. You can select your favorite repository or get binaries from Intel.

Aid in the Evolution of oneDNN

The Unified Acceleration (UXL) Foundation’s oneAPI specification is implemented by oneDNN. You are welcome to take part.

Features

Automated Optimization

  • Make use of the deep learning frameworks that are already in place.
  • Create and implement deep learning applications that are platform-independent and include automatic instruction set architecture (ISA) recognition and ISA-specific optimization.

Network Optimization

  • Determine performance snags with Intel VTune Profiler.
  • Make use of convolutional parameters and hardware to automatically pick and propagate memory formats.
  • Combine primitives with operations performed on their output, such as Conv+ReLU.
  • Quantize primitives utilising Intel Neural Compressor-Optimized Implementations of essential Building Blocks from FP32 to FP16, bf16, or int8.

Optimized Implementations of Key Building Blocks

  • Convolution
  • Multiplication of matrices
  • Combining
  • Normalization of batches
  • Functions of activation
  • Cells in recurrent neural networks (RNNs)
  • Cells with long short-term memory (LSTM)

Abstract Programming Model

  • Primitive: Any low-level process, like convolution, data format reordering, and memory, that serves as the foundation for more complicated operations.
  • Memory: Manages the memory allotted to a certain engine, data type, memory format, and tensor dimensions.
  • Engine: A CPU or GPU is an example of a hardware processing unit.
  • Stream: An engine’s queue of basic operations.

Examples of Cases

Netflix Doubles the Performance of Video Encoding

To make the most of the Intel Advanced Vector Extensions 512 (Intel AVX-512) instruction set, Netflix researchers used oneDNN to optimise cloud-based video encoding.

IBM Research Increases Watson’s NLP Capabilities by 165%

On third-generation Intel Xeon Scalable processors, oneDNN optimisations increased text and sentiment classification task performance by up to 35%. A 165% gain was achieved by switching to fourth-generation Intel Xeon processors with oneDNN.

Demonstrations

Use a oneDNN graph to speed up inference on x86-64 machines

PyTorch 2.0 and later versions support the oneDNN Graph, which can speed up inference on x86-64 CPUs using float32 and bfloat16 datatypes.

Enhance Intel Processor Transformer Model Inference

Utilizing optimized oneDNN operations through the fusion of TensorFlow processes lowers inference time and memory consumption.

News

Intel’s 2024 Tools Are Now Accessible with UXL Foundation

OneDNN’s 2024.1 version includes new features that increase development productivity, optimise performance on Intel Xeon processors, and streamline storage efficiency.

CPU and GPU Performance Enhancements Added in oneDNN 3.4

Improved MATMUL performance for big language models and transformer-style models on Intel CPUs and GPUs are among the performance enhancements aimed at new and upcoming device support.

Specifications

CategorySpecifications
Processors– Intel Atom processors with Intel Streaming SIMD Extensions
– Intel Core processors
– Intel Xeon processors
GPUs– Intel Processor Graphics Gen9 and above
– Iris graphics
– Intel Data Center GPUs
– Intel Arc A-series graphics
Host & Target Operating Systems– Linux
– Windows
– macOS
Languages– SYCL (Requires Intel oneAPI Base Toolkit)
– C and C++
Compilers– Intel oneAPI DPC++/C++ Compiler
– Clang
– GNU C++ Compiler
– Microsoft Visual Studio
– LLVM for Apple
Threading Runtimes– Intel oneAPI Threading Building Blocks
– OpenMP
– SYCL
Drakshi
Drakshi
Since June 2023, Drakshi has been writing articles of Artificial Intelligence for govindhtech. She was a postgraduate in business administration. She was an enthusiast of Artificial Intelligence.
RELATED ARTICLES

Recent Posts

Popular Post