Medical Text Processing
Real-world evidence, or RWE, has long been a crucial part of the medication approval process according to the FDA. Furthermore, in some therapeutic studies, RWE might lessen the need for placebos. However, the clinical data necessary to support RWE evidence are often found in unstructured forms, such doctor’s notes, and need to be “abstracted” into a clinically organized manner. This process may be greatly accelerated and made more scalable with the use of cloud computing and artificial intelligence.
Because it saves time and is more economical, top drug researchers are beginning to supplement their clinical trials with real-world data for FDA study filings. The enormous volumes of past unstructured patient medical data eventually contribute to growing storage requirements once the patient’s treatment is over. A vital component of clinical decision support systems is unstructured data. insights need human analysis of the unstructured material in its original unstructured state.
Unstructured medical data might lead to more care gaps and care variations as it lacks distinct data points from which insights can be rapidly derived. It follows from basic logic that all of this patient data cannot be abstracted quickly or accurately by unaided human abstraction alone. Applying serverless software components on Google Cloud for applied natural language processing (NLP) makes it possible to efficiently identify and direct clinical abstractors toward a prioritized list of patient medical papers.
How to Use Google Cloud for Medical Text Processing
You may build a data pipeline that receives raw clinical text documents, analyzes them using Google Cloud’s Healthcare Natural Language API, and then loads the structured json result into BigQuery using Vertex Workbench Jupyter Notebooks. From there, you may create a dashboard that displays attributes of clinical text, including the quantity of labels and connections. This will allow you to create a trainable language model that can extract text and, with human labeling over time, improve even more.
Let’s examine the medical text entity extraction process to have a better understanding of how the solution handles these issues:
Prepare Document AI for Data Ingestion: The first item the system opens is a PDF file containing de-identified medical information, which might be handwritten notes from a doctor or other unstructured material. Using optical character recognition (OCR) technology, Document AI initially processes this unstructured material by digitizing the text and pictures.
Natural Language Interpretation: A collection of pretrained models, including ones for extracting and categorizing medical text, are available via the Cloud Natural Language API. The labels produced as part of this service’s output will act as the Vertex AI AutoML service’s “ground truth” labels, to which further, domain-specific custom labels will be added.
Vertex AI AutoML: Even if certain members of your team lack coding or data science experience, Vertex AI AutoML provides a machine learning toolkit for automated label categorization and human-in-the-loop dataset labeling. It does this by employing a Google model that your team can train using your data.
BigQuery Databases: BigQuery stores data that have undergone NLP processing for further analysis and visualization.
Looker Dashboard: By providing visualizations that assist the team in identifying the clinical documents with the greatest importance using criteria such as tag and idea “density,” the Looker Dashboard serves as the primary “brain” for the clinical text abstraction process.
Python Jupyter Notebook : Utilize Vertex AI (business) or Colab (free) notebooks to examine your text data and make calls to various APIs for NLP and ingestion.
The Natural Language API for Healthcare
By concentrating on the following improvements, the Healthcare Natural Language API enables you to conduct medical text entity resolution at scale effectively:
- Maximizing data extraction and document OCR by doing the document processing in parallel using scalable Cloud Functions.
- Using fully managed and serverless services to optimize time to market and cost.
- Enabling an adaptable and inclusive workflow with ML-assisted human-in-the-loop abstraction.
- A Looker dashboard that serves as a decision support interface for teams of human clinical abstractors and a set of reusable Python scripts that can be run from a Jupyter notebook or Google Cloud Functions. These scripts drive the various stages of an NLP processing pipeline, which transforms medical text into structured patient data.
- A collection of Google Cloud Storage Buckets designed to facilitate different phases of data processing
- The Looker dashboard’s data model is generated using two BigQuery tables, “Entity” and “Document,” in a dataset named “entity.”
- A Vertex AI dataset that clinical abstractors utilize for human-in-the-loop labeling. For more flexibility and scalability, labeling requests are sent to the Google Vertex AI Labeling Team.
- A Looker dashboard that shows the stack-ranked documents that the human abstractors will handle in the order determined by a custom “density” metric—that is, the quantity of data components (labels) included in those documents. This dashboard will direct human abstractors to focus on papers with sparse labeling first, allowing Google’s natural language processing to do the bulk of the work.
Subsequent Subjects and Actions
The rich relationships that are pre-loaded from the Healthcare Natural Language API into BigQuery and Looker were not loaded in this sample; instead, just the entity and document metadata were imported. By using these connections, one may construct a biological knowledge network, investigate the correlations between cohorts, illness, and treatments, and contribute to the development of novel theories.