BigQuery Metastore With Apache Iceberg, A Metadata Service

January 23, 2025

170

BigQuery metastore, a single metadata service that supports Apache Iceberg, is available. Does your company make use of BigQuery, Apache Spark, Apache Flink, and Apache Hive, among other data processing engines? If you could offer a single source of truth for all of your analytics workloads, wouldn’t that be fantastic? BigQuery metastore, a fully managed, uniform metadata solution that facilitates consistent data governance and processing engine compatibility, is now available to the public.

A highly scalable runtime metadata service, BigQuery metastore supports the open Apache Iceberg table format and is compatible with a variety of engines, including BigQuery, Apache Spark, Apache Hive, and Apache Flink. This enables analytics engines to query a single copy of the data using a single schema, regardless of whether the data is kept in BigLake external tables, BigQuery storage tables, or BigQuery tables for Apache Iceberg. For clients wishing to transition from historical data lakes to a contemporary lakehouse design, the BigQuery metastore is an essential component. This solution offers built-in security and control for user interactions with data and is intricately integrated with BigQuery’s enterprise capabilities.

The challenges of metadata management

Data processing engines have historically been closely related to metastores and other information management systems. Maintaining numerous copies of the data and metadata stored in various metastores is necessary if you are utilising several processing engines. To query the same data in BigQuery, for instance, you must replicate the table definition you created in Hive Metastore for querying from an open-source engine like Spark. In order to maintain table definition synchronisation across many metastores, pipelines must also be constructed. Stale information, a poor user experience, difficulties with security and access, and a lack of insight into data lineage can all be consequences of this fragmentation.

A metastore for the lakehouse era

every data, every user, and any task can be managed on a single platform using Metastore’s lakehouse architecture, which combines the advantages of data lakes and data warehouses without requiring management of both. A range of processing engines, including BigQuery, Spark, Flink, and Hive, can access open data formats like Apache Iceberg. While preserving data governance, the uniformity of metadata across engines facilitates data discovery and utilisation, enabling self-service BI and ML tools to spur creativity.

Additionally, It grows with your workloads instantly and is serverless, requiring no setup or configuration. For data scientists, engineers, and analysts, this no-ops environment democratises your data and lowers TCO.

Cross engine interoperability for analytics — Image credit to Google Cloud

Key benefits of BigQuery metastore include:

Interoperability between different engines

With a unified view of all metadata for all data sources in the lakehouse, BigQuery metastore offers a single common metastore for the lakehouse architecture, making it simple for your users to locate and comprehend the data they want. This makes it possible to query and DML data stored in both proprietary and open formats across analytics runtimes, object stores, and BigQuery storage.

Support for catalogues and open formats

BigQuery storage tables, BigQuery tables for Apache Iceberg, and external tables are supported by the metastore.

Integrated governance

Key governance features offered by BigQuery, including business metadata, data profiling, data quality, fine-grained access restrictions, data masking, sharing, data lineage, audit logging, automatic cataloguing and universal search, and data metastore, are integrated with BigQuery.

Completely controlled at the BigQuery scale

BigQuery metastore, a serverless, fully managed service, is incredibly user-friendly and integrates with several important engines, including BigQuery, Spark, Hive, and Flink. The It’s architecture base makes sure it can manage traffic at BigQuery scale and scale to your application’s increasing query processing volume.

Architecture without servers

The serverless architecture offered by BigQuery metastore removes the requirement for cluster or server administration. This facilitates implementation, lowers operational costs, and enables demand-driven growth automatically.

Interoperability of engines

You may query open-format tables stored in BigQuery without the need for further settings with it, which gives you direct table access in BigQuery. For instance, you may use Spark to generate a table and BigQuery to query it directly. This lessens the need for intricate data migration or ETL procedures while streamlining your analytics workflow.

Unified experience for users

A consistent workflow between BigQuery and BigQuery Studio is offered via BigQuery metastore. This enables direct usage of Spark in BigQuery and BigQuery Studio.

BigQuery metastore in action

Let’s now examine the use of BigQuery metastore. The following PySpark script configures a Spark environment to communicate with a BigQuery external table, a BigQuery storage table, and a BigQuery table for Apache Iceberg.

Overview of the BigQuery metastore

Google Cloud’s BigQuery metastore is a fully managed metastore for data analytics tools. For the management of metadata from several sources, it offers a single source of truth. The metastore is a helpful tool for data analysts and engineers since it can be accessed via BigQuery and other open data processing engines.

For instance, open source query engines like Apache Spark may be used with the BigQuery metastore as the catalogue. BigQuery may be used to query Spark-created tables without requiring you to synchronise your information.

Integrations that are supported

The Google Cloud dashboard, gcloud CLI, or BigQuery REST APIs may all be used with It.

The following integrations are supported by BigQuery metastore:

formats for tables. At least Apache Iceberg 1.5.2.
Version 2.2 or higher of Dataproc.
Engines for processing data. version 3.3 or higher of Spark.
plugins. Iceberg catalogue plugin for BigQuery metastores.

Distinctions from BigLake Metastore

On Google Cloud, the BigQuery metastore is the suggested metastore.

The following information outlines the main distinctions between BigQuery metastore and BigLake metastore:

Different from BigQuery, BigLake Metastore is a stand-alone metastore solution that exclusively works with Iceberg tables. Its three-part resource model is distinct. BigQuery does not automatically find tables in BigLake.
The BigQuery catalogue serves as the foundation It, which interfaces with BigQuery directly. Multiple open source engines can modify tables in the metastore, and BigQuery itself can query the same tables. Your metadata has a single source of truth when you utilise BigQuery. For instance, Spark and BigQuery metastore may be directly integrated. When it comes to conducting jobs and storing information, this connection makes the workflow more streamlined and helps cut down on redundancy.

BigQuery Metastore With Apache Iceberg, A Metadata Service

The challenges of metadata management

A metastore for the lakehouse era

Key benefits of BigQuery metastore include:

Interoperability between different engines

Support for catalogues and open formats

Integrated governance

Completely controlled at the BigQuery scale

Architecture without servers

Interoperability of engines

Unified experience for users

BigQuery metastore in action

Overview of the BigQuery metastore

Integrations that are supported

Distinctions from BigLake Metastore

Windows Hotpatching: New Updates In Windows Server 2025

Google Cortex Framework helps Mars Wrigley With agile media

AWS AppSync API Allows Namespace Data Source Connectors

LEAVE A REPLY Cancel reply

Page Content

Recent Posts

AMD Radeon Pro W6600 Benchmark in CAD, Video Editing

Intel Core Ultra 5 225H Performance for Everyday Tasks

Intel Core i9 13900K Price, Benchmark, and Specifications

NVIDIA Tesla V100 Price, Features And Specifications

Google Magic Mirror Experience Driven by Gemini Models

Pluto AI: A New Internal AI Platform For Enterprise Growth

About Us

Tutorials