Delivery Hero’s GitHub-Vertex AI Voucher Fraud Detection

November 17, 2023

484

Delivery Hero's GitHub-Vertex AI Voucher Fraud Detection

Delivery Hero’s GitHub and Vertex AI Story

Effective model maintenance is essential in machine learning’s ever-changing world. Fraud detection models must be redeployed often to combat human opponents that may reverse-engineer fraud engine logic and change their methods.

At Delivery Hero, a prominent local delivery platform, the Incentive Fraud team builds ML-powered, rule-based services to identify and prevent incentive voucher fraud. These coupons can be given to those who have just registered to encourage users to use the delivery food platform, therefore it should be able to distinguish new clients from those making new profiles for each order. Delivery Hero works in 70+ countries with varied data privacy laws and local limits, making this task difficult.

Tech set up

Model Serving Overview

The team uses a REST API services with rule-based logic and integrated ML models to make choices. Since the API is utilized on each meal order, latency is tight. The service and models operate on neighborhood Kubernetes clusters with horizontally autoscaler and other high reliability strategies to meet latency requirements.

Vertex AI was chosen for its ML model creation environment due to its scalability and close connection with BigQuery, Delivery Hero’s primary data warehouse, as well as additional Google Cloud resources. Vertex AI Pipelines train models and store metadata in Vertex AI Models Registry. Cloud Build builds FastAPI docker images from trained and evaluated models.

For ML CI/CD

All workflows were tightly integrated with GitHub Actions CI/CD to enable quick model development iterations, allowing users to train models and create pictures using software development and MLOps best practices:

Version Control (VC)- tracks model and data changes (Vertex Pipelines get a snapshot of a BigQuery table saved on GCS as parquet files with `DATASET_VERSION` suffix).
Continuous Integration (CI)-reliable trunk-based development within a GitHub repository uploading processes to shared GCP Vertex AI settings without local pipeline triggers. Users (data scientist) can execute experiments in Pull Requests (PR) with precise experiment tracking.
Continuous Deployment (CD)- lets users securely push new models to production without technical help.
Continuous Testing (CT) – CI/CD integration of model quality metrics and artifacts lineage improves communication between data scientists, ML developers, stakeholders, and decisionmakers.

CT was implemented to facilitate the building and maintenance of numerous models trained on different data subsets using the same code base. A common Delivery Hero setup is to keep tens of models, one per nation, and deploy them in regional clusters. Thus, each model must be chosen independently, but development, assessment, and deployment iterations are shared.

Implementation details

The team’s Python library ml-utils enabled ML Operations processes for the Incentive Fraud use case utilizing the GCP Vertex AI Model Registry at the backend. This package links Kubeflow pipelines (used internally by GCP Vertex AI Pipelines): Pipeline (or Pipeline Runs), Experimentation (grouping of pipelines), and Models and Datasets from a single CLI or Python API interface. Ml-utils loads the massive Kubeflow Pipeline Runs json definitions, identifies the needed artifacts, and distributes them in a pre-defined manner. More crucially, it offers a layer of abstraction to Models that enforces name rules and queries the Vertex ML Metadata to find models by wildcards.

Training pipeline steps for Vertex AI:

load GCS dataset snapshot,
divide data for training/testing/validation
calculate dataset attributes (statistics, visualization, data drift),
new model training,
use ml-utils to load the production model,
Using ml-utils, load the “champion model” (best-ever model by various measures),
evaluated all three models against the same test dataset split and saved the evaluation metrics in Vertex AI Model Registry as Experiment information for ml-utils to access.
Update the aliases and submit the new model to Vertex AI Model Registry:

An alias with a git commit hash, such as `pr123-a1b2c3d` for PR commits or `main-a1b2c3d` for main branch commits
Transfer the champ alias if the model outperforms the champion model.

Data slices, models, and other pipeline artifacts are automatically stored to GCS. After all Vertex AI pipelines succeed, the GitHub Actions task that initiated them queries the Vertex AI Model Registry to get evaluation metrics and prints them as markdown to the job Summary page . Each git change is linked to a set of Vertex pipelines and the model quality report, which data scientists and management use to make choices based on model quality.

When ready to redeploy certain models, the team creates a PR to edit the serving image config, which defines the slice to be pushed to production. Blue dashed line in Vertex Model Registry. This PR initiates another GitHub Actions workflow that submits a Cloud build workflow that imports model pickles, creates the FastAPI server image with the models baked in, runs integration tests, and changes model aliases.

Infrastructure overview

Team employed two settings for Incentive Fraud projects:

Dev a permissive security environment for model testing and e2e model training pipeline tests.
Model release candidates are produced in Prod, a secure environment.

Five high-level rules helped the team manage the two projects:

GitHub Actions running GCP processes in Dev and Prod behave identically.
Each PR commit triggers Dev GCP workflows.
Each main branch commit activates Prod GCP workflows.
Dev and Prod pipelines share code and dataset snapshots.
Each PR is squished into one commit before merging into the main branch.

These criteria allow the team to have a clean linear history of the main branch, where each commit that changes model code, dataset version, or configuration creates per-country release candidate models with expected quality metrics:

5 COMMENTS

Nobl9's SLOgpt.ai Transforms Google Cloud's Vertex AI December 5, 2023 At 3:08 pm
[…] Revolutionizing SRE with Google Cloud’s Vertex AI […]
Log in to leave a comment
Google Maps SDK For IOS With Swift Package Manager December 15, 2023 At 12:30 pm
[…] (IDE), developers can add package dependencies to iOS apps by entering the URL for the SDK’s GitHub repository. Swift Package Manager in X code lets you define package versions that match SDK […]
Log in to leave a comment
Learn 5 Open-source AI Tools: PyTorch For Deep Learning December 16, 2023 At 10:58 am
[…] AI projects and libraries on GitHub drive digital innovation in healthcare, finance, and education. Frameworks and tools save […]
Log in to leave a comment
AlphaFold2 Rapid Drug Discovery Vision Break Boundaries December 19, 2023 At 12:09 pm
[…] order to enable inference at scale, the Vertex AI solution for AlphaFold 2 prioritizes the following […]
Log in to leave a comment
Mastering Deployment: Dataflow ML Step-by-Step Guide January 9, 2024 At 10:36 am
[…] to the GitHub repository and follow the directions to get started. They think that anyone using Dataflow ML will […]
Log in to leave a comment

Delivery Hero’s GitHub-Vertex AI Voucher Fraud Detection

Delivery Hero’s GitHub and Vertex AI Story

Tech set up

Model Serving Overview

For ML CI/CD

Implementation details

Training pipeline steps for Vertex AI:

Infrastructure overview

Google Magic Mirror Experience Driven by Gemini Models

Pluto AI: A New Internal AI Platform For Enterprise Growth

Bolttech Improves Customer Experience with AWS Generative AI

5 COMMENTS

LEAVE A REPLY Cancel reply

Page Content

Recent Posts

AMD Radeon Pro W6600 Benchmark in CAD, Video Editing

Intel Core Ultra 5 225H Performance for Everyday Tasks

Intel Core i9 13900K Price, Benchmark, and Specifications

NVIDIA Tesla V100 Price, Features And Specifications

Google Magic Mirror Experience Driven by Gemini Models

Pluto AI: A New Internal AI Platform For Enterprise Growth

About Us

Tutorials