Waveye, a member of Intel Liftoff, created a centralized safety system for industrial settings using high-resolution imaging radars. Training perception models for the classification of various object classes using high-density radar point clouds was the main goal of this effort.
The results of performance acceleration tests on their radar processing and learning pipeline using Intel oneAPI libraries are shown in this article. In comparison to the raw CPU implementation, these experiments showed notable performance gains of 20–50x, allowing for faster AI lifecycle iteration for key radar perception algorithms.
The project’s history
For worker safety in automated industries, Waveye is creating a centralized safety system based on radar. The project’s objective is to recognize and track people, automated guided vehicles (AGVs), and forklifts working together in the same space while maintaining privacy.
The system does not require the objects to carry tags for identification, can cover huge industrial areas, and is resilient to lighting conditions.
The goal was to create a perception stack that could detect humans with >99% accuracy in less than a second and detect robots and forklifts with comparable robustness.
Intel wanted to confirm, as part of partnership with Intel, how well could use hardware abstraction tools like Intel OneAPI Toolkit on Intel Tiber AI Cloud to move edge processing pipeline to x86 architecture. Dr. Gor Hakobyan, Waveye’s CTO.
The radar perception model tracks objects over time and recognizes and categorizes them. When there is a chance of a collision, it can alert robots to slow down or select a path that avoids collisions.
The system is comprised of an end-to-end pipeline for radar processing and learning that was initially put into place using:
- CUDA acceleration for processing at the edge
- Cloud environments using CPU or CUDA processing
The main objective of this experiment was to assess how well Intel oneAPI libraries accelerated their pipeline in Intel cloud environments, with a particular emphasis on the acceleration capabilities of the Math Kernel Library (MKL).
OneMKL routines could be immediately mapped to some of the steps in their processing pipeline. Custom CUDA kernels are used by the remainder of the pipeline, though.
Since Halide’s philosophy aligns well with radar cube processing procedures and is simple to incorporate into pre-existing C++ code, the Waveeye team decided to generalize them in an embeddable manner. MKL-accelerated Halide offered a significant speedup over naive CPU implementation with minimal manual optimization needed.
It was easy to work with oneMKL and the Intel oneAPI suite on the Intel Tiber AI Cloud. Decomposing radar processing pipeline into linear algebra operations and allowing decades of progress in numerical methods to maximize hardware performance has a certain charm. Waveye’s CEO is Levon Budagyan.
Testing Methodology
They compared the following important computing elements of their radar processing pipeline:
- Implementation of raw CPU (baseline)
- Accelerated adoption of MKL
To guarantee consistent measurement, each component was run several times with representative workloads.
With microsecond accuracy, the timing was recorded.
Results
Total Performance
In comparison to the baseline CPU implementation across several computational kernels, the Intel oneAPI libraries in particular, MKL provided acceleration factors ranging from 20x to 50x.
Detailed Performance Improvements
Operation | CPU Runtime (ms) | MKL Runtime (ms) | Acceleration Factor |
FFT | 1,700 | 50 | 34x |
GEMM | 50,000 | 450 | 111x |
Specialized kernel1 | 500 | 12 | 42x |
Specialized kernel2 | 250 | 5 | 50x |
Effect on Workflow
There are numerous important advantages to this performance improvement:
- Effective simulation of edge processing in a cloud setting is made possible by edge simulation.
- Development Acceleration: Radar perception algorithms iterate through the AI lifecycle much more quickly.
- Resource Optimization: Lowering the need for and expenses related to computing resources
- Iteration Speed: Quicker cycles of experimentation for testing and developing algorithms
Conclusion: Accelerating Radar Processing with Intel oneAPI
Waveye’s radar processing pipeline has seen notable performance improvements as a result of the inclusion of Intel oneAPI libraries, particularly the Intel Math Kernel Library (MKL). This improvement minimizes computing resource demands while enabling more effective development and testing cycles, with a 20–50x speedup over the basic CPU implementation.
According to these results, Intel oneAPI is a reliable and scalable acceleration platform that works especially well for workloads involving compute-intensive radar processing that are set up in Intel cloud settings.
Next Steps: Advancing Optimization and Performance
In order to build on these encouraging outcomes, Waveye intends to undertake the following projects:
- Additional Specialized Kernel Optimization: To optimize performance benefits, keep improving particular computational kernels.
- Extended Experiments in Various Operational Situations: Verify performance gains under various workloads and real-world scenarios.
- Assessment of more Intel oneAPI Toolkit Components: To find more acceleration potential, investigate additional libraries and tools inside the oneAPI ecosystem.
- Examining Hybrid Acceleration Strategies: To develop hybrid solutions that maximize performance at various processing stages, look at integrating Intel MKL with other specialized libraries.
The following actions are intended to further integrate Waveye’s radar processing applications with Intel technology and increase their efficiency.