Using Watsonx.data Presto C++ with the Intel Sapphire Rapid Processor on AWS to speed up query performance
Over the past 25 years, there have been notable improvements in database speed due to IBM and Intel’s long-standing cooperation. The most recent generation of Intel Xeon Scalable processors, when paired with Intel software, can potentially improve IBM Watsonx.data performance, according to internal research conducted by IBM.
A hybrid, managed data lake house, IBM Watsonx.data is tailored for workloads including data, analytics, and artificial intelligence. Using engines like Presto and Spark to drive corporate analytics is one of the highlights. Watsonx.data also offers a single view of your data across hybrid cloud environments and a customizable approach.
Presto C++
The next edition of Presto, called Presto C++, was released by IBM in June. It was created by open-source community members from Meta, IBM, Uber, and other companies. The Velox, an open-source C++ native acceleration library made to be compatible with various compute engines, was used in the development of this query engine in partnership with Intel. In order to further improve query performance through efficient query rewrite, IBM also accompanied the release of the Presto C++ engine with a query optimizer built on decades of experience.
Summary
A C++ drop-in replacement for Presto workers built on the Velox library, Presto C++ is also known as the development name Prestissimo. It uses the Proxygen C++ HTTP framework to implement the same RESTful endpoints as Java workers. Presto C++ does not use JNI and does not require a JVM on worker nodes because it exclusively uses REST endpoints for communication with the Java coordinator and amongst workers.
Inspiration and Goals
Presto wants to be the best data lake system available. The native Java-based version of the Presto evaluation engine is being replaced by a new C++ implementation using Velox in order to accomplish this goal.
In order to allow the Presto community to concentrate on more features and improved connectivity with table formats and other data warehousing systems, the evaluation engine has been moved to a library.
Accepted Use Cases
The Presto C++ evaluation engine supports just certain connectors.
- Reads and writes via the Hive connection, including CTAS, are supported.
- Only reads are supported for iceberg tables.
- Both V1 and V2 tables, including tables with delete files, are supported by the Iceberg connector.
- TPCH.naming=standard catalog property for the TPCH connector.
Features of Presto C++
- Task management: Users can monitor and manage tasks using the HTTP endpoints included in Presto C++. This tool facilitates tracking ongoing procedures and improves operational oversight.
- Data processing across a network of nodes can be made more effective by enabling the execution of functions on distant nodes, which improves scalability and distributed processing capabilities.
- For secure internal communication between nodes, authentication makes use of JSON Web Tokens (JWT), guaranteeing that data is safe and impenetrable while being transmitted.
- Asynchronous data caching with prefetching capabilities is implemented. By anticipating data demands and caching it beforehand, this maximizes processing speed and data retrieval.
- Performance Tuning: Provides a range of session parameters, such as compression and spill threshold adjustments, for performance tuning. This guarantees optimal performance of data processing operations by enabling users to adjust performance parameters in accordance with their unique requirements.
Limitations of Presto C++
There are some drawbacks to the C++ evaluation engine:
- Not every built-in function is available in C++. A query failure occurs when an attempt is made to use unimplemented functions. See Function Coverage for a list of supported functions.
- C++ does not implement all built-in types. A query failure will occur if unimplemented types are attempted to be used.
- With the exception of CHAR, TIME, and TIME WITH TIMEZONE, all basic and structured types in Data Types are supported. VARCHAR, TIMESTAMP, and TIMESTAMP WITH TIMEZONE are subsumptions of these.
- The length n in varchar[n] is not honored by Presto C++; it only supports the limitless length VARCHAR.
- IPADDRESS, IPPREFIX, UUID, KHYPERLOGLOG, P4HYPERLOGLOG, QDIGEST, TDIGEST, GEOMETRY, and BINGTILE are among the types that are not supported.
- The C++ evaluation engine does not use all of the plugin SPI. Specifically, several plugin types are either fully or partially unsupported, and C++ workers will not load any plugins from the plugins directory.
- The C++ evaluation engine does not support PageSourceProvider, RecordSetProvider, or PageSinkProvider.
- Block encodings, parametric types, functions, and types specified by the user are not supported.
- At the split level, the event listener plugin is not functional.
- See Remote Function Execution for information on how user-defined functions differ from one another.
- The C++ evaluation engine has a distinct memory management system. Specifically:
- There is no support for the OOM killer.
- There is no support for the reserved pool.
- Generally speaking, queries may utilize more memory than memory arbitration permits. Refer to Memory Management.
Functions
reduce_agg
Reduce_agg is not allowed to return null in the inputFunction or the combineFunction of C++-based Presto. This is acceptable but ill-defined behavior in Presto (Java). See reduce_agg for more details about reduce_agg in Presto.
Amazon Elastic Compute Cloud (EC2) R7iz instances are high-performance CPU instances that are designed for memory. With a sustained all-core turbo frequency of 3.9 GHz, they are the fastest 4th Generation Intel Xeon Scalable-based (Sapphire Rapids) instances available in the cloud. R7iz instances can lower the total cost of ownership (TCO) and provide performance improvements of up to 20% over Z1d instances of the preceding generation. They come with integrated accelerators such as Intel Advanced Matrix Extensions (Intel AMX), which provide a much-needed substitute for clients with increasing demands for AI workloads.
R7iz instances are well-suited for front-end Electronic Design Automation (EDA), relational database workloads with high per-core licensing prices, and workloads including financial, actuarial, and data analytics simulations due to their high CPU performance and large memory footprint.
IBM and Intel have collaborated extensively to offer open-source software optimizations to Watsonx.data, Presto, and Presto C++. In addition to the hardware enhancements, Intel 4th Gen Xeon has produced positive Watsonx.data outcomes.
Based on publicly available 100TB TPC-DS Query benchmarks, IBM Watsonx.data with Presto C++ v0.286 and query optimizer on AWS ROSA, running on Intel processors (4th generation), demonstrated superior price performance over Databrick’s Photon engine, with better query runtime at comparable cost.