Intel Extension For Scikit-learn: Time Series PCA & DBSCAN

April 8, 2025

192

Intel Extension For Scikit-learn — Intel Extension for Scikit-learn: Time Series PCA & DBSCAN

Intel investigates the clustering of time series data utilizing density-based spatial clustering of applications with noise (DBSCAN) for clustering and principal component analysis (PCA) for dimensionality reduction. Without the need for labeled data, this method finds patterns in time series data, such as city traffic flow. To improve performance, utilize the Intel Extension for Scikit-learn. Repetitive patterns in time series data are frequently caused by machinery, human behavior, or other quantifiable factors. It can be difficult to manually identify these patterns. PCA and DBSCAN are examples of unsupervised learning techniques that allow us to find these patterns.

Data Generation

To replicate time series patterns, it create synthetic waveform data. Three different waveforms make up the data, each of which has been enhanced with noise to mimic unpredictability found in the real world. They make use of Gaël Varoquaux’s scikit-learn agglomerative clustering example. It can be purchased under CC0 or BSD-3Clause licensing.

The author built the code and plot using Gaël Varoquaux's scikit-learn agglomerative clustering algorithm — Image credit to Intel

Accelerate PCA and DBSCAN with Intel Extension for Scikit-learn

Using an Intel Extension for Scikit-learn patching scheme, PCA and DBSCAN may both be accelerated. Scikit-learn is a machine learning Python module. One of the AI tools that smoothly speeds up scikit-learn programs on Intel CPUs and GPUs in single- and multi-node configurations is the Intel Extension for Scikit-learn. To increase machine learning training and inference by up to 100x with comparable mathematical correctness, this extension dynamically patches scikit-learn estimators.

GitHub repository for Intel Extension for Scikit-learn — Image credit to Intel

The Scikit-learn API is used by the Intel Extension for Scikit-learn, which can be enabled via the command line or by changing a few lines in your Python program before importing scikit-learn:

from sklearnex import patch_sklearn
patch_sklearn()

Dimensionality Reduction with PCA

Intel utilize PCA to minimize dimensionality while keeping 99% of the dataset’s variance before attempting to cluster 90 samples, each of which has 2,000 features:

To find discernible clusters in the condensed data, it employ a pairplot:

import pandas as pd
import seaborn as sns

df = pd.DataFrame(XPC, columns=[‘PC1’, ‘PC2’, ‘PC3’, ‘PC4’])
sns.pairplot(df)
plt.show()

Looking for clusters in the data after dimensionality reduction — Image credit to Intel

Cluster with DBSCAN

Intel pick PC1 and PC2 for DBSCAN clustering since they appear to effectively split the clusters based on the pairplot. An estimate of the DBSCAN EPS parameter is also available. Since the PC1 against PC0 graphic indicates that 50 is a suitable separation distance for the observed clusters, it went with that value:

To determine how well DBSCAN has detected the clusters, it can plot the clustered data.

Plot of clustered data generated using the previous code example — Image credit to Intel

Compare to Ground Truth

The figure illustrates how well the DBSCAN compares to the original ground truth data and identifies believable colored clusters. In this instance, the clustering successfully restored the underlying patterns that produced the data. It can efficiently find and categorize patterns in time series data by employing DBSCAN for clustering and PCA for dimensionality reduction. This method eliminates the need for labeled samples and enables the identification of underlying structures in the data.

Intel Extension for Scikit-learn

Accelerate scikit-learn for Data Analytics & Machine Learning

Scikit-learn, sometimes known as sklearn, is a machine learning Python module. For both single- and multi-node setups, the Intel Extension for Scikit-learn smoothly accelerates your scikit-learn applications for Intel CPUs and GPUs. This extension package enhances the performance of your machine learning algorithms while dynamically patching scikit-learn estimators.

The AI Tools addon gives you the option to combine machine learning tools with your current AI packages.

With this plugin, scikit-learn allows you to:

Increase inference and training by up to 100 times while maintaining the same level of mathematical correctness.
Keep use the scikit-learn API, which is open source.
Use a few lines of code or the command line to enable and disable the extension.

The whole set of Intel AI and machine learning development tools and resources includes both scikit-learn and the Intel Extension for Scikit-learn.

Features

Replace current estimators with mathematically similar accelerated versions to speed up scikit-learn (sklearn) algorithms. Algorithm Supported
Because the accelerations are powered by the Intel oneAPI Data Analytics Library (oneDAL), you can run it on any CPU that is compatible with x86 or an Intel GPU.
Decide how the accelerations will be applied:
- Without altering any code, patch every compatible algorithm from the command line.
- To patch all compatible algorithms in your Python script, add two lines of code.
- In your script, specify that just specific algorithms should be patched.
- Patch and unpatch your environment globally for all scikit-learn applications.

Intel Extension for Scikit-learn Spec

Processors	All CPUs with x86 architecture, All integrated and discrete GPUs from Intel
Operating systems	Linux, Windows and Windows Server
Language	Python

Intel Extension For Scikit-learn: Time Series PCA & DBSCAN

Data Generation

Accelerate PCA and DBSCAN with Intel Extension for Scikit-learn

Dimensionality Reduction with PCA

Cluster with DBSCAN

Compare to Ground Truth

Intel Extension for Scikit-learn

Features

Intel Extension for Scikit-learn Spec

Google Magic Mirror Experience Driven by Gemini Models

Pluto AI: A New Internal AI Platform For Enterprise Growth

Bolttech Improves Customer Experience with AWS Generative AI

LEAVE A REPLY Cancel reply

Page Content

Recent Posts

AMD Radeon Pro W6600 Benchmark in CAD, Video Editing

Intel Core Ultra 5 225H Performance for Everyday Tasks

Intel Core i9 13900K Price, Benchmark, and Specifications

NVIDIA Tesla V100 Price, Features And Specifications

Google Magic Mirror Experience Driven by Gemini Models

Pluto AI: A New Internal AI Platform For Enterprise Growth

About Us

Tutorials