What is iEEG
Mountain View-based NeuroPace, Inc. makes the FDA-approved RNS System responsive neurostimulator device for people with refractory focal onset epilepsy. A lead with four electrode connections can be connected to a neurostimulator in up to two ways. Both the delivery of electrical pulses and the detection of aberrant patterns unique to each patient are controlled by physicians. Intracranial electroencephalograms, or iEEGs, are recorded by the device and span four channels with a sample rate of 250 Hz each. Typically, the recordings contain roughly 90 seconds of data. Almost 5,000 patients have provided over 16 million iEEG files to date.
Determining efficient stimulation patterns for the reduction of seizures is a primary research objective at NeuroPace. A theory posits that treatment settings that have proven successful for patients with comparable iEEG activity may also work for newly diagnosed patients or patients requiring modifications to their current stimulation regimen, thus potentially enhancing the traditional trial-and-error method of determining stimulation programming settings.
Physicians can swiftly identify similar patient profiles based on chosen iEEG files if large-scale iEEG data is searched for similar brain activity patterns. This is necessary for such data-driven procedures to be feasible. Previously, locating similar iEEG files among patients necessitated a convoluted processing pipeline that included grouping iEEG data within patients, locating cluster centres, and locating approximate nearest neighbours (ANN) from other patients using dimensionality reduction techniques like PCA and t-SNE. The flexibility and practical utility of the programme were limited because the approximate nearest neighbours were only computed once every several months using a small sample of new patient iEEG recordings.
Positively, vector databases may now be queried directly for comparable vector embeddings thanks to recent developments. Without having to complete clustering procedures beforehand, this invention may allow doctors to choose any iEEG file from a patient and locate comparable cross-patient iEEG data. As fresh iEEG files become available, all that is needed is to keep the vector database updated. Improved scalability could result from this reduction, which would make it much easier to query comparable iEEG files over millions of entries.
Using embedding models to turn iEEG data into vectors, the NeuroPace AI team and Google Cloud developers carried out a proof-of-concept study. The vector data was then stored in a vector database called AlloyDB for PostgresQL. AlloyDB is a fully managed database that performs vector similarity searches based on the pgvector extension and is compatible with PostgreSQL, making it ideal for heavy transaction workloads.
AlloyDB Omni
AlloyDB Omni, a version of the database that can be downloaded and used anywhere, further permits on-premises hosting of the database, preserving the data inside the confines of an on-prem HIPAA-compliant environment. Reducing reliance on external network connectivity by having the database on-premise also lessens the chance of application outages that might arise from hosting the database externally while the remainder of the application is hosted on-premise.
Google Cloud handled almost 1.2 million iEEG files from 414 clinical trial participants in this proof-of-concept project. 20 patients’ worth of data were used for testing, and 394 patients’ data were added to the AlloyDB cloud service. A unique embedding model created by the NeuroPace AI team converted each iEEG file into a spectrogram image and then into vectors. After that, 50 randomly chosen iEEG files from the test cohort were utilised to query the AlloyDB vector database, which now contained these vectors (Figure 2).
AlloyDB with PGvector offers three distinct index types (Hierarchical Navigable Small World (HNSW), IVFFLAT, and IVF) that help reduce latency while conducting similarity searches in comparison to a brute-force search:
Using a graph-based technique, HNSW creates several layers of interconnected nodes to create more effective search pathways even for big datasets.
The {IVFFLAT} index balances speed and accuracy by first grouping vectors into coarse groups using a tree-based clustering technique, and then conducting a more thorough search inside the most comparable clusters.
Google AlloyDB Omni
A recent addition to AlloyDB AI enhancements is the new “IVF” index, which increases the total number of dimensions supported per vector and dramatically reduces query time by utilising deeper integrations with AlloyDB query processing in addition to Google quantization techniques.
In actuality, distinct indices (as well as the corresponding algorithms) can perform very differently under very diverse use situations. Google Cloud conducted a detailed benchmarking across IVF and HNSW indexes for the NeuroPace use case of locating similar cross-patient iEEGs. Recall, or the percentage of results in brute-force queries, and latency, or how quickly the query could finish, were both measured.
An analysis of the performance of these two approximate nearest neighbour (ANN) algorithms shows that IVF has high recall rates (~0.9) and a median query latency of roughly 60 ms, while HNSW indexing performed slightly worse (0.8) and was slower (median query latency of 160 ms) than IVF. To balance speed and latency, both indexes provide a variety of characteristics.
While brute force took about 14.7 seconds to find comparable iEEG data, both approaches performed noticeably better in terms of query time. The histogram of the recall and query latency for the two distinct indexing strategies in comparison to brute force. The output of a single sample iEEG query file from a test patient is displayed .
These results excite NeuroPace, since they could further the research towards effectively navigating large amounts of iEEG data. The development of algorithms that help determine the best programming settings for the RNS System may be made possible by this breakthrough. The new ScaNN index from AlloyDB may potentially help to further enhance usability and performance, and Google Cloud is eager to test it out.