Trade Surveillance Regulations
Every bank must meet regulatory standards. An investment bank like Deutsche Bank must recognize and prevent market manipulation and abuse, even though financial regulation is wide. This is trade surveillance.
Deutsche Bank’s Compliance Technology division implements this control function technically. The Compliance Technology team monitors all bank business lines’ transactions by retrieving data from front office operational systems and performing scenario computations. Any worrisome trends trigger an internal alert for a compliance officer to examine and resolve.
Many systems provide input data, but market, trade, and reference data are most important. The team had to duplicate data between, and sometimes inside, several analytical systems to source data for compliance technology applications from front-office systems, which caused data quality and lineage difficulties and architectural complexity. Trade surveillance situations demand a system that can store and interpret massive amounts of data utilizing distributed computation frameworks like Apache Spark.
Innovation in architecture
With its full data analytics ecosystem of products and services, Google Cloud can assist big organizations tackle difficult data processing and sharing challenges. BigQuery, Google Cloud’s serverless data warehouse, and Dataproc, a managed Apache Spark service, can enable data-heavy corporate use cases like trade surveillance.
The Compliance Technology team used Google Cloud managed services in their new trade surveillance architecture. In the new design, operational front-office systems submit data to BigQuery tables. BigQuery now provides trade, market, and reference data to data users like Trade Surveillance. The Compliance Technology team doesn’t need all the front-office data, so they may generate numerous views using simply the input data that contains the necessary information for trade surveillance situations.
In BigQuery, Spark, Dataproc, and other technologies, trade surveillance business logic is executed as data transformations. This business logic detects abnormal trade patterns that indicate market abuse or manipulation. Written to output BigQuery tables, suspicious cases are processed through research and investigation workflows by compliance officers, who detect false positives and file a Suspicious Activity Report to the regulator if the case indicates a compliance violation.
Surveillance alerts are kept to measure detection effectiveness and reduce false positives. Dataproc uses Spark while BigQuery uses SQL for these computations. They are conducted frequently and reported back into trade surveillance scenario execution to enhance monitoring systems. Cloud Composer, a managed Apache Airflow workflow orchestration solution, orchestrates ETL procedures for trade surveillance scenarios and effectiveness calibrations.
The advantages of serverless data architecture
The architecture above indicates that trade monitoring needs several data sources. Using BigQuery to source this data allows Deutsche Bank data users to use it without copying it. By reducing hops, a simpler design enhances data quality and decreases cost.
BigQuery’s lack of instances and clusters eliminates the need to duplicate data. Instead, data consumers may access any table if they have the necessary rights and query the table URI (i.e., the Google Cloud project-id, dataset name, and table name). Thus, users may access the data from their Google Cloud projects without copying and storing it.
The Compliance Technology team needs only query BigQuery views with input data and tables with derived data from compliance-specific ETLs to conduct trade surveillance scenarios. This avoids data duplication, making data more trustworthy and architecture more robust owing to fewer data hops. Above all, this zero-copy strategy lets data consumers in other bank teams besides trade surveillance utilize market, trade, and reference data in BigQuery.
BigQuery has another benefit. ETL orchestration is easy with Apache Airflow’s BigQuery operators since it’s connected with Google Cloud services like Dataproc and Cloud Composer. No data copying is needed to handle BigQuery data with Spark. Instead, an out-of-the-box connection reads data using the BigQuery Storage API, which streams massive amounts of data straight to Dataproc workers in parallel for quick processing.
Finally, BigQuery lets data producers use Google Cloud’s inherent data quality tools like Dataplex automated data quality. This service lets you set criteria for data freshness, correctness, uniqueness, completeness, timeliness, and other aspects and apply them to BigQuery data. This is serverless and automated without infrastructure for rules execution and data quality enforcement. Thus, the Compliance Technology team can guarantee that front-office data meets data quality criteria, giving value to the new architecture.
The new design uses integrated and serverless data analytics tools and managed services from Google Cloud, allowing the Compliance Technology team to concentrate on Trade Surveillance application business logic. Unlike a big, on-premises Hadoop cluster, BigQuery doesn’t need maintenance periods, version updates, upfront sizing, or hardware replacements.
The new architecture’s cost-effectiveness is the last benefit. The architecture uses pay-as-you-go services to let team members concentrate on business-relevant features instead of infrastructure. Compute power is solely used for batch activities like compliance-specific ETLs, trade surveillance scenarios, and effectiveness calibration, rather than 24/7 machine operation. This reduces the cost even more than an always-on, on-prem option.