Improve Intel Data Direct I/O (DDIO) Workload Performance with Intel VTune Profiler.
Profile uncore hardware performance events in Intel Xeon processors with oneAPI
One hardware feature included in Intel Xeon CPUs is Intel Data Direct I/O (DDIO) technology. By making the CPU cache the primary point of entry and exit for I/O data going into and out of the Intel Ethernet controllers and adapters, it contributes to advances in I/O performance.
To monitor the effectiveness of DDIO and Intel Virtualization Technology (Intel VT) for Directed I/O (Intel VT-d), which permits the independent execution of several operating systems and applications, it is essential to monitor uncore events, or events that take place outside the CPU core. By analyzing uncore hardware events, you may improve the performance of Intel Data Direct I/O (DDIO) workloads using Intel VTune Profiler, a performance analysis and debugging tool driven by the oneAPI.
We’ll talk about using VTune Profiler to evaluate and enhance directed I/O performance in this blog. Let’s take a quick look at Intel Data Direct I/O technology before we go into the profiling approach.
Overview of the Intel Data Direct I/O (DDIO) Technology
Intel Integrated I/O technology Intel DDIO was launched in 2012 for the Intel Xeon processor E5 and E7 v2 generations. It aims to increase system-level I/O performance by employing a new processor-to-I/O data flow.
I/O operations were sluggish and processor cache was a scarce resource prior to the development of Data Direct I/O technology. It was necessary for the host processor’s main memory to store and retrieve any incoming or departing data from an Ethernet controller or adapter, respectively. It used to be necessary to move the data from main memory to the cache before working with it.
This led to a lot of read and write operations in the memory. This also caused some additional, speculative read operations from the I/O hub in some of the older designs. Excessive memory accesses often lead to higher system power consumption and deterioration of I/O performance.
Intel DDIO technology was created to rearrange the flow of I/O data by making the processor cache the primary source and destination of I/O data instead of the main memory, as the processor cache is no longer a restricted resource.
Depending on the kind of workload at the workstation or on the server, the DDIO approach offers benefits like:
- Higher transaction rates, reduced battery usage, reduced latency, increased bandwidth, and more.
- There is no industry enablement needed for the Data Direct I/O technology.
- It doesn’t rely on any hardware, and it doesn’t need any modifications to your operating system, drivers, or software.
Boost DDIO Performance Using Intel VTune Profiler
A function carried out in a CPU’s uncore section, outside of the processor core itself, that yet affects processor performance as a whole is referred to as an uncore event. For instance, these occurrences may be connected to the Intel Ultra Path Interconnect (UPI) block, memory controller, or I/O stack action.
A new recipe in the VTune Profiler Cookbook explains how to count these kinds of uncore hardware events using the tool’s input and output analysis function. You may analyze Data Direct I/O and VT-d efficiency by using the data to better understand the traffic and behavior of the Peripheral Component Interconnect Express (PCIe).
The recipe explains how to do input and output analysis, evaluate the findings, and classify the resulting I/O metrics. In essence, VTune Profiler v2023.2 or later and an Intel Xeon scalable CPU of the first or later generation are needed. Although the approach is suitable to the most recent version of Intel Xeon Processors, the I/O metrics and events covered in the recipe are based on the third generation Intel Xeon Scalable Processor.
Perform I/O Analysis with VTune Profiler
Start by analyzing your application’s input and output using VTune Profiler. With the analysis function, you may examine CPU, bus, and I/O subsystem use using a variety of platform-level metrics. You may get data indicating the Intel Data Direct I/O(DDIO) use efficiency by turning on the PCIe traffic analysis option.
Analyze the I/O Metrics
VTune Profiler Web Server or VTune Profiler GUI may be used to examine the report that is produced as a consequence of the input and output analysis. Using the VTune Profiler Web Server Interface, the recipe illustrates the examination of many I/O performance indicators, including:
- Platform diagram use of the physical core, DRAM, PCIe, and Intel UPI linkages.
- PCIe Traffic Summary, which includes metrics for both outgoing (caused by the CPU) and incoming (caused by I/O devices) PCIe traffic.
- These measurements aid in the computation of CPU/IO conflicts, latency for incoming read/write requests, PCIe bandwidth and efficient use, and other factors.
- Metrics to assess the workload’s effectiveness in re-mapping incoming I/O device memory locations to various host addresses using Intel VT-d technology.
- Usage of DRAM and UPI bandwidth.