Intel oneAPI Deep Neural Network Library(OneDNN): Optimize the Deep Learning Framework’s CPU and GPU Performance.
oneDNN documentation
Create Deep Learning Frameworks and Applications More Quickly
Implementations of deep learning building blocks that are well optimized are offered by the Intel oneAPI Deep Neural Network Library (oneDNN). This open source, cross-platform toolkit abstracts out instruction sets and other performance optimization issues, allowing developers to utilize the same API for CPUs, GPUs, or both in deep learning applications and frameworks.
With this library, you are able to:
- Boost the functionality of the frameworks you now use, like PyTorch, TensorFlow, Intel AI Tools, and the OpenVINO toolkit.
- Use optimised building components to create deep learning frameworks and applications more quickly.
- Install programs that are tailored for Intel CPUs and GPUs without having to write code specifically for each target.
Download the Stand-Alone Version
OneDNN can be downloaded independently. You can select your favorite repository or get binaries from Intel.
Aid in the Evolution of oneDNN
The Unified Acceleration (UXL) Foundation’s oneAPI specification is implemented by oneDNN. You are welcome to take part.
Features
Automated Optimization
- Make use of the deep learning frameworks that are already in place.
- Create and implement deep learning applications that are platform-independent and include automatic instruction set architecture (ISA) recognition and ISA-specific optimization.
Network Optimization
- Determine performance snags with Intel VTune Profiler.
- Make use of convolutional parameters and hardware to automatically pick and propagate memory formats.
- Combine primitives with operations performed on their output, such as Conv+ReLU.
- Quantize primitives utilising Intel Neural Compressor-Optimized Implementations of essential Building Blocks from FP32 to FP16, bf16, or int8.
Optimized Implementations of Key Building Blocks
- Convolution
- Multiplication of matrices
- Combining
- Normalization of batches
- Functions of activation
- Cells in recurrent neural networks (RNNs)
- Cells with long short-term memory (LSTM)
Abstract Programming Model
- Primitive: Any low-level process, like convolution, data format reordering, and memory, that serves as the foundation for more complicated operations.
- Memory: Manages the memory allotted to a certain engine, data type, memory format, and tensor dimensions.
- Engine: A CPU or GPU is an example of a hardware processing unit.
- Stream: An engine’s queue of basic operations.
Examples of Cases
Netflix Doubles the Performance of Video Encoding
To make the most of the Intel Advanced Vector Extensions 512 (Intel AVX-512) instruction set, Netflix researchers used oneDNN to optimise cloud-based video encoding.
IBM Research Increases Watson’s NLP Capabilities by 165%
On third-generation Intel Xeon Scalable processors, oneDNN optimisations increased text and sentiment classification task performance by up to 35%. A 165% gain was achieved by switching to fourth-generation Intel Xeon processors with oneDNN.
Demonstrations
Use a oneDNN graph to speed up inference on x86-64 machines
PyTorch 2.0 and later versions support the oneDNN Graph, which can speed up inference on x86-64 CPUs using float32 and bfloat16 datatypes.
Enhance Intel Processor Transformer Model Inference
Utilizing optimized oneDNN operations through the fusion of TensorFlow processes lowers inference time and memory consumption.
News
Intel’s 2024 Tools Are Now Accessible with UXL Foundation
OneDNN’s 2024.1 version includes new features that increase development productivity, optimise performance on Intel Xeon processors, and streamline storage efficiency.
CPU and GPU Performance Enhancements Added in oneDNN 3.4
Improved MATMUL performance for big language models and transformer-style models on Intel CPUs and GPUs are among the performance enhancements aimed at new and upcoming device support.
Specifications
Category | Specifications |
---|---|
Processors | – Intel Atom processors with Intel Streaming SIMD Extensions – Intel Core processors – Intel Xeon processors |
GPUs | – Intel Processor Graphics Gen9 and above – Iris graphics – Intel Data Center GPUs – Intel Arc A-series graphics |
Host & Target Operating Systems | – Linux – Windows – macOS |
Languages | – SYCL (Requires Intel oneAPI Base Toolkit) – C and C++ |
Compilers | – Intel oneAPI DPC++/C++ Compiler – Clang – GNU C++ Compiler – Microsoft Visual Studio – LLVM for Apple |
Threading Runtimes | – Intel oneAPI Threading Building Blocks – OpenMP – SYCL |