Tuesday, December 3, 2024

SYCL 2020’s Five New Features For Modern C++ Programmers

- Advertisement -

SYCL

For accelerator-using C++ programmers, SYCL 2020 is interesting. People enjoyed contributing to the SYCL standard, a book, and the DPC++ open source effort to integrate SYCL into LLVM. The SYCL 2020 standard included some of the favorite new features. These are Intel engineers’ views, not Khronos’.

Khronos allows heterogeneous C++ programming with SYCL. After SYCL 2020 was finalized in late 2020, compiler support increased.

- Advertisement -

SYCL is argued in several places, including Considering a Heterogeneous Future for C++ and other materials on sycl.tech. How will can allow heterogeneous C++ programming with portability across vendors and architectures? SYCL answers that question.

SYCL 2020 offers interesting new capabilities to be firmly multivendor and multiarchitecture with to community involvement.

The Best Five

A fundamental purpose of SYCL 2020 is to harmonize with ISO C++, which offers two advantages. First, it makes SYCL natural for C++ programmers. Second, it lets SYCL test multivendor, multiarchitecture heterogeneous programming solutions that may influence other C++ libraries and ISO C++.

Changing the base language from C++11 to C++17 allows developers to use class template argument deduction (CTAD) and deduction guides, which necessitated several syntactic changes in SYCL 2020.

- Advertisement -
  • Backends allow SYCL to target more hardware by supporting languages/frameworks other than OpenCL.
  • USM is a pointer-based alternative to SYCL 1.2.1’s buffer/accessor concept.
  • A “built-in” library in SYCL 2020 accelerates reductions, a frequent programming style.
  • The group library abstracts cooperative work items, improving application speed and programmer efficiency by aligning with hardware capabilities (independent of vendor).
  • Atomic references aligned with C++20 std::atomic_ref expand heterogeneous device memory models.

These enhancements make the SYCL ecosystem open, multivendor, and multiarchitecture, allowing C++ writers to fully leverage heterogeneous computing today and in the future.

Backends

With backends, SYCL 2020 allows implementations in languages/frameworks other than OpenCL. Thus, the namespace has been reduced to sycl::, and the SYCL header file has been relocated from to .

These modifications affect SYCL deeply. Although implementations are still free to build atop OpenCL (and many do), generic backends have made SYCL a programming approach that can target more diverse APIs and hardware. SYCL can now “glue” C++ programs to vendor-specific libraries, enabling developers to target several platforms without changing their code.

SYCL 2020 has true openness, cross-architecture, and cross-vendor.

This flexibility allows the open-source DPC++ compiler effort to support NVIDIA, AMD, and Intel GPUs by implementing SYCL 2020 in LLVM (clang). SYCL 2020 has true openness, cross-architecture, and cross-vendor.

Unified shared memory

Some devices provide CPU-host memory unified views. This unified shared memory (USM) from SYCL 2020 allows a pointer-based access paradigm instead of the buffer/accessor model from SYCL 1.2.1.

Programming with USM provides two benefits. First, USM provides a single address space across host and device; pointers to USM allocations are consistent and may be provided to kernels as arguments. Porting pointer-based C++ and CUDA programs to SYCL is substantially simplified. Second, USM allows shared allocations to migrate seamlessly between devices, enhancing programmer efficiency and compatibility with C++ containers (e.g., std::vector) and algorithms.

Three USM allocations provide programmers as much or as little data movement control as they want. Device allocations allow programmers full control over application data migration. Host allocations are beneficial when data is seldom utilized and transporting it is not worth the expense or when data exceeds device capacity. Shared allocations are a good compromise that immediately migrate to use, improving performance and efficiency.

Reductions

Other C++ reduction solutions, such as P0075 and the Kokkos and RAJA libraries, influenced SYCL 2020.

The reducer class and reduction function simplify SYCL kernel variable expression using reduction semantics. It also lets implementations use compile-time reduction method specialization for good performance on various manufacturers’ devices.

The famous BabelStream benchmark, published by the University of Bristol, shows how SYCL 2020 reductions increase performance. BabelStream’s basic dot product kernel computes a floating-point total of all kernel work items. The 43-line SYCL 1.2.1 version employs a tree reduction in work-group local memory and asks the user to choose the optimal device work-group size. SYCL 2020 is shorter (20 lines) and more performance portable by leaving algorithm and work-group size to implementation.

Group Library

The work-group abstraction from SYCL 1.2.1 is expanded by a sub-group abstraction and a library of group-based algorithms in SYCL 2020.

Sub_group describes a kernel’s cooperative work pieces running “together,” providing a portable abstraction for various hardware providers. Sub-groups in the DPC++ compiler always map to a key hardware concept SIMD vectorization on Intel architectures, “warps” on NVIDIA architectures, and “wavefronts” on AMD architectures enabling low-level performance optimization for SYCL applications.

In another tight agreement with ISO C++, SYCL 2020 includes group-based algorithms based on C++17: all_of, any_of, none_of, reduce, exclusive_scan, and inclusive_scan. SYCL implementations may use work-group and/or sub-group parallelism to produce finely tailored, cooperative versions of these functions since each algorithm is supported at various scopes.

Atomic references

Atomics improved in C++20 with the ability to encapsulate types in atomic references (std::atomic_ref). This design (sycl::atomic_ref) is extended to enable address spaces and memory scopes in SYCL 2020, creating an atomic reference implementation ready for heterogeneous computing.

SYCL follows ISO C++, and memory scopes were necessary for portable programming without losing speed. Don’t disregard heterogeneous systems’ complicated memory topologies.

Memory models and atomics are complicated, hence SYCL does not need all devices to use the entire C++ memory model to support as many devices as feasible. SYCL offers a wide range of device capabilities, another example of being accessible to all vendors.

Beyond SYCL 2020: Vendor Extensions

SYCL 2020’s capability for multiple backends and hardware has spurred vendor extensions. These extensions allow innovation that provides practical solutions for devices that require it and guides future SYCL standards. Extensions are crucial to standardization, and the DPC++ compiler project’s extensions inspired various elements in this article.

Two new DPC++ compiler features are SYCL 2020 vendor extensions.

Group-local Memory at Kernel Scope

Local accessors in SYCL 1.2.1 allow for group-local memory, which must be specified outside of the kernel and sent as a kernel parameter. This might seem weird for programmers from OpenCL or CUDA, thus has created an extension to specify group-local memory in a kernel function. This improvement makes kernels more self-contained and informs compiler optimizations (where local memory is known at compile-time).

FPGA-Specific Extensions

The DPC++ compiler project supports Intel FPGAs. It seems that the modifications, or something similar, can work with any FPGA suppliers. FPGAs are a significant accelerator sector, and nous believe it pioneering work will shape future SYCL standards along with other vendor extension initiatives.

Have introduced FPGA choices to make buying FPGA hardware or emulation devices easier. The latter allows quick prototyping, which FPGA software writers must consider. FPGA LSU controls allow us to tune load/store operations and request a specific global memory access configuration. Also implemented data placement controls for external memory banks (e.g., DDR channel) to tune FPGA designs via FPGA memory channel. FPGA registers allow major tuning controls for FPGA high-performance pipelining.

Summary

Heterogeneity endures. Many new hardware alternatives focus on performance and performance-per-watt. This trend will need open, multivendor, multiarchitecture programming paradigms like SYCL.

The five new SYCL 2020 features assist provide portability and performance mobility. C++ programmers may maximize heterogeneous computing with SYCL 2020.

- Advertisement -
Drakshi
Drakshi
Since June 2023, Drakshi has been writing articles of Artificial Intelligence for govindhtech. She was a postgraduate in business administration. She was an enthusiast of Artificial Intelligence.
RELATED ARTICLES

Recent Posts

Popular Post

Govindhtech.com Would you like to receive notifications on latest updates? No Yes