The oneCCL(Collective Communications Library) for Intel OneAPI. Distributed Training for Deep Neural Networks that is Scalable and Effective.
Implement Multi-Node Communication Patterns
Researchers and developers may train deeper and fresher models faster with the Intel oneAPI Collective Communications Library (oneCCL). This is accomplished by distributing model training over several nodes utilising optimal communication patterns.
Whether you are developing deep learning frameworks from scratch or modifying pre-existing ones, the library is made to be easily integrated into them.
- Based on communication middleware at a lower level. Many interconnects, including Ethernet, InfiniBand, and Cornelis Networks, are transparently supported by message passing interfaces (MPI) and libfabrics.
- Designed with Intel CPUs and GPUs in mind for optimal performance.
- Enables the scalability of communication patterns to be driven by the tradeoff of computation for communication performance.
- Allows for the effective use of collectives, such as all-gather, all-reduce, and reduce-scatter, which are widely used in neural network training.
Features
Common APIs to Support Deep Learning Frameworks
A collective API made available by oneCCL supports:
- Collective processes that are frequently employed in machine learning and deep learning tasks
- Khronos Group interoperability with SYCL
Deep Learning Optimizations
A number of optimisations are made possible by the runtime implementation, including:
- Asynchronous advancement for overlap in computation communication
- One or more cores are dedicated to guarantee the best possible network use.
- Out-of-order execution, persistence, and message prioritization are collectives in low-precision data types.
Intel oneAPI Collective Communications Library (oneCCL)
A scalable and high-performing communication framework for Deep Learning (DL) and Machine Learning (ML) applications is the Intel oneAPI Collective Communications framework. It builds upon the concepts first introduced in the Intel Machine Learning Scaling Library and extends the architecture and API to include new functionalities and applications.
- Based on libfabrics and MPI, which are lower-level communication middleware.
- Designed to facilitate the profitable trade-off of computation for communication performance, hence promoting the scalability of communication patterns.
- Permits a number of DL-specific optimizations, including out-of-order execution, persistent operations, and prioritization.
- An API that is aware of DPC++ and can operate on a variety of hardware targets, including CPUs and GPUs.
- Functions with a variety of interconnects, including Ethernet, InfiniBand, and Intel Omni-Path Architecture (Intel OPA).
oneCCL Usage
MPI and the Intel oneAPI DPC++/C++ Compiler are just two examples of the hardware and software dependencies listed in the Intel oneAPI Collective Communications Library System Requirements.
oneCCL Code Sample
It has an optional SYCL-aware API capability. When building the oneCCL stream object, the CPU and SYCL back ends are options.
- Put ccl_stream_host as the first argument for the CPU backend.
- Depending on the type of device, enter either ccl_stream_cpu or ccl_stream_gpu for the SYCL backend.
- Regarding group activities that use the SYCL stream:
- To assumes that communication buffers for the C API are sycl::buffer objects converted to void.
- It requires communication buffers to be given by reference for C++ APIs.
Sample Code for oneCCL. - The same GitHub repository contains a Getting Started sample that includes instructions on how to build and execute the code.
oneCCL Getting Started Samples
C++, C, and SYCL-compliant CPU and GPU extensions are used to implement the CCL sample codes. Users can learn how to compile Intel oneAPI Collective Communications Library (oneCCL) codes with different oneCCL configurations in the Intel oneAPI environment by using all decrease collective operation samples.
Optimized for | Description |
---|---|
OS | Linux Ubuntu 18.04 |
Hardware | Kaby Lake with GEN9 or newer |
Software | Intel oneAPI Collective Communications Library (oneCCL) Intel oneAPI DPC++/C++ Compiler, v, GNU Compiler |
What you will learn | Basic oneCCL programming model for both Intel CPU and GPU |
Time to complete | 15 minutes |
Specifications
Category | Specifications |
---|---|
Processors | Intel Core processor family, Intel Xeon processor family, Intel Xeon Scalable processor family |
GPU | Intel Data Center GPU Max Series |
Operating System | Linux |
Languages | SYCL, C, C++ |
Compilers | GNU Compiler Collection (GCC), Intel oneAPI DPC++/C++ Compiler |
Distributed Environments | MPI, OFI |
Download as a Toolkit Component
The Intel oneAPI Base Toolkit, a core collection of tools and libraries for creating high-performance, data-centric applications across several platforms, includes oneCCL.