Intel investigates the clustering of time series data utilizing density-based spatial clustering of applications with noise (DBSCAN) for clustering and principal component analysis (PCA) for dimensionality reduction. Without the need for labeled data, this method finds patterns in time series data, such as city traffic flow. To improve performance, utilize the Intel Extension for Scikit-learn. Repetitive patterns in time series data are frequently caused by machinery, human behavior, or other quantifiable factors. It can be difficult to manually identify these patterns. PCA and DBSCAN are examples of unsupervised learning techniques that allow us to find these patterns.
Data Generation
To replicate time series patterns, it create synthetic waveform data. Three different waveforms make up the data, each of which has been enhanced with noise to mimic unpredictability found in the real world. They make use of Gaël Varoquaux’s scikit-learn agglomerative clustering example. It can be purchased under CC0 or BSD-3Clause licensing.

Accelerate PCA and DBSCAN with Intel Extension for Scikit-learn
Using an Intel Extension for Scikit-learn patching scheme, PCA and DBSCAN may both be accelerated. Scikit-learn is a machine learning Python module. One of the AI tools that smoothly speeds up scikit-learn programs on Intel CPUs and GPUs in single- and multi-node configurations is the Intel Extension for Scikit-learn. To increase machine learning training and inference by up to 100x with comparable mathematical correctness, this extension dynamically patches scikit-learn estimators.

The Scikit-learn API is used by the Intel Extension for Scikit-learn, which can be enabled via the command line or by changing a few lines in your Python program before importing scikit-learn:
from sklearnex import patch_sklearn
patch_sklearn()
Dimensionality Reduction with PCA
Intel utilize PCA to minimize dimensionality while keeping 99% of the dataset’s variance before attempting to cluster 90 samples, each of which has 2,000 features:
To find discernible clusters in the condensed data, it employ a pairplot:
import pandas as pd
import seaborn as sns
df = pd.DataFrame(XPC, columns=[‘PC1’, ‘PC2’, ‘PC3’, ‘PC4’])
sns.pairplot(df)
plt.show()

Cluster with DBSCAN
Intel pick PC1 and PC2 for DBSCAN clustering since they appear to effectively split the clusters based on the pairplot. An estimate of the DBSCAN EPS parameter is also available. Since the PC1 against PC0 graphic indicates that 50 is a suitable separation distance for the observed clusters, it went with that value:
To determine how well DBSCAN has detected the clusters, it can plot the clustered data.

Compare to Ground Truth
The figure illustrates how well the DBSCAN compares to the original ground truth data and identifies believable colored clusters. In this instance, the clustering successfully restored the underlying patterns that produced the data. It can efficiently find and categorize patterns in time series data by employing DBSCAN for clustering and PCA for dimensionality reduction. This method eliminates the need for labeled samples and enables the identification of underlying structures in the data.
Intel Extension for Scikit-learn
Accelerate scikit-learn for Data Analytics & Machine Learning
Scikit-learn, sometimes known as sklearn, is a machine learning Python module. For both single- and multi-node setups, the Intel Extension for Scikit-learn smoothly accelerates your scikit-learn applications for Intel CPUs and GPUs. This extension package enhances the performance of your machine learning algorithms while dynamically patching scikit-learn estimators.
The AI Tools addon gives you the option to combine machine learning tools with your current AI packages.
With this plugin, scikit-learn allows you to:
- Increase inference and training by up to 100 times while maintaining the same level of mathematical correctness.
- Keep use the scikit-learn API, which is open source.
- Use a few lines of code or the command line to enable and disable the extension.
The whole set of Intel AI and machine learning development tools and resources includes both scikit-learn and the Intel Extension for Scikit-learn.
Features
- Replace current estimators with mathematically similar accelerated versions to speed up scikit-learn (sklearn) algorithms. Algorithm Supported
- Because the accelerations are powered by the Intel oneAPI Data Analytics Library (oneDAL), you can run it on any CPU that is compatible with x86 or an Intel GPU.
- Decide how the accelerations will be applied:
- Without altering any code, patch every compatible algorithm from the command line.
- To patch all compatible algorithms in your Python script, add two lines of code.
- In your script, specify that just specific algorithms should be patched.
- Patch and unpatch your environment globally for all scikit-learn applications.
Intel Extension for Scikit-learn Spec
Processors | All CPUs with x86 architecture, All integrated and discrete GPUs from Intel |
Operating systems | Linux, Windows and Windows Server |
Language | Python |