BigQuery Native JSON Unity integrates seamlessly with MongoDB, optimizing pipelines and accelerating data processing.
Google excited to present a significant improvement to MongoDB Atlas Google Cloud Dataflow templates. Users no longer need to do intricate data transformations when integrating their MongoDB Atlas data into BigQuery with the native support for JSON data types.
By using modern data analytics and machine learning, this simplified method enables customers to fully utilize their data while saving time and money.

Limitations without JSON support
In the past, dataflow pipelines made to work with MongoDB Atlas data frequently required that complicated structures be flattened to a single level of nesting or converted into JSON strings before being loaded into BigQuery. This strategy is feasible, but it may have a number of disadvantages:
- Increased latency: The necessity of several data conversions may result in higher latency and a considerable reduction in the pipeline’s overall execution duration.
- More operational expenses: This strategy may result in higher operating expenses due to the additional data conversions and storage needs.
- Decreased query performance: Nestled data analysis can be challenging when sophisticated document structures are flattened into JSON String format.
What is new, then?
By allowing customers to load nested JSON data straight from MongoDB Atlas into BigQuery without the need for any conversions in between, BigQuery’s Native JSON format solves these issues.
There are many advantages to this strategy:
- Decreased operating costs: Users can drastically cut expenditures related to infrastructure, storage, and computing resources by doing away with the requirement for further data transformations.
- Improved query execution times and overall query performance are the results of BigQuery’s enhanced storage and query engine, which is built to process data in Native JSON format effectively.
- Increased data flexibility: users don’t need to go through laborious and prone to mistakes flattening or normalizing data in order to query and analyze complex data structures, including nested and hierarchical data.
Using BigQuery’s robust JSON functions directly on the MongoDB data that is put into BigQuery is a major benefit of this pipeline. Thus, a laborious and intricate data transformation procedure is no longer required. You can use typical BQML queries to query and analyze the JSON data in BigQuery.
Either the Google Cloud console or the code from the github repository can be used to install the Dataflow pipeline, depending on your preference for a simplified cloud-based method or a more interactive, adaptable option.
Enabling data-driven decision-making
Google’s Dataflow template offers an adaptable way to move data between MongoDB and BigQuery. Using MongoDB’s Change Stream feature, it may either process complete collections or record little changes. The output format of the pipeline can be altered to meet your unique requirements. You may easily configure it using the userOption argument, regardless of whether you prefer a flattened schema with individual fields or a raw JSON representation. With User-Defined Functions (UDFs), data transformation can also be carried out while the template is being executed.
Your data processing workflows may be made much more efficient, performant, and economical by implementing the BigQuery Native JSON format into your Dataflow pipelines. You can make data-driven decisions and glean insightful information from your data with this potent combination.
The Dataflow templates for BigQuery and MongoDB Atlas by following the Google Documentation.
Benefits of BigQuery Native JSON
Dataflow’s new direct JSON support has a number of significant benefits.
Reduced Operating Costs
Users can drastically cut operational costs, such as those related to infrastructure, storage, and computing resources, by doing away with the requirement for further data transformations.
Enhanced Query Performance
The effective processing of data in Native JSON format by BigQuery’s optimized storage and query engine leads to noticeably quicker query execution times and enhanced query performance in general.
Improved Data Flexibility
Complex data structures, such as nested and hierarchical data, can be readily queried and analyzed by users without the need for laborious and prone to error flattening or normalization procedures.
BigQuery JSON Functions
The ability to use BigQuery’s robust JSON functions directly on the loaded MongoDB data is a key advantage that is highlighted. This pipeline may immediately use BigQuery’s strong JSON functions on MongoDB data fed into BigQuery, a major benefit. Therefore, pre-processing modifications for analytical purposes are no longer necessary.
Deployment Flexibility
For a more simplified cloud-based method, the Dataflow pipeline can be installed via the Google Cloud Console. For greater customization, code can be executed from a GitHub repository. The Dataflow pipeline can be implemented using the Google Cloud console or by executing the code from the github repository, depending on your preference for a simplified cloud-based method or a more interactive, adaptable solution.
Dataflow Template Capabilities
With the use of MongoDB’s Change Stream capability, the Dataflow template provides flexibility in data transmission by processing large collections or catching small changes. The userOption option can also be used to specify whether to give a flattened schema or raw JSON as the output format. Additionally, data transformation during template execution can be accomplished through User-Defined Functions (UDFs).
Encouraging Decision-Making Based on Data
This update improves data processing workflow efficacy, performance, and cost-effectiveness to help users gain insights and make better data-driven decisions. BigQuery Native JSON format in Dataflow pipelines can boost data processing efficiency, performance, and cost. This potent mix enables you to make data-driven decisions and glean insightful information from your data.