Google Cloud Introducing the Hive-BigQuery Connector

July 1, 2023

67

Page Contents

We are excited to announce the public GA release of the Hive-BigQuery Connector, an open-source solution that enables Apache Hive workloads to read from and write to BigQuery and BigLake tables. This connector addresses the needs of customers who are interested in migrating their data warehouse from Apache Hive to BigQuery but have faced challenges along the way. Whether you are looking to fully migrate or want both systems to coexist, the Hive-BigQuery Connector offers a wide range of use cases to suit your requirements.

What is the Hive-BigQuery Connector?

If you have experience running Hadoop or Spark workloads on Google Cloud, you might already be familiar with the Cloud Storage Connector and the Apache Spark SQL connector for BigQuery. These connectors allow storing and accessing data files in Cloud Storage and enable reading and writing data between BigQuery and Spark’s dataframes.

Similarly, the Hive-BigQuery Connector implements the Hive StorageHandler API, facilitating the integration of Hive workloads with BigQuery and BigLake tables. While Hive’s execution engine handles compute operations, such as aggregates and joins, the connector manages interactions with the data layer in BigQuery. This includes supporting data stored in either BigQuery’s native storage or open-source data formats in Cloud Storage buckets through a BigLake connection.

Apache Hive is a popular open-source data warehouse that provides an SQL-like interface for querying data stored in various databases and file systems integrated with Apache Hadoop. Over time, Hive has evolved to utilize cloud storage services, and the new connector simplifies migration by enabling Hive to integrate with native storage solutions like BigQuery.

Benefits of Cloud Data Warehouse Migration

Migrating a data warehouse to the cloud offers numerous benefits, including:

Reduced costs: Pay for the resources you use, optimizing cost efficiency.
Increased scalability: Easily scale up or down to meet changing needs.
Improved reliability: Leverage redundant and highly-available systems.
Enhanced security: Implement encryption for data in transit and at rest, and enforce granular access control.
Expanded capabilities: Integrate with a wide range of Google Cloud native tools and solutions, such as BigQuery’s materialized views and BI Engine for improved performance, Pub/Sub for low-latency data transport, Dataflow for scalable data processing, and Vertex AI for machine learning model development and deployment.

BigQuery Migration Service

To facilitate the migration process, Google Cloud offers the BigQuery Migration Service—a comprehensive solution designed to accelerate the migration from Hive data warehouses to BigQuery. The service includes free-to-use tools that assist with assessment, planning, data transfer, and data validation. Notably, the BigQuery batch SQL translator and interactive SQL translator enable the translation of Hive queries into BigQuery’s ANSI-compliant SQL syntax, allowing queries to be executed natively within BigQuery’s execution engine.

Use Cases for the Hive-BigQuery Connector

The Hive-BigQuery Connector caters to various core use cases, including:

Wholesale migration with continuity of operations: When migrating the entire Hive data warehouse to BigQuery, this use case ensures uninterrupted operations during the migration process. By moving the data to BigQuery first, you can allow original Hive queries to access the migrated data through the Connector while gradually translating them to BigQuery’s SQL dialect. Once the migration is complete, you can exclusively use BigQuery and retire Hive.
Selective usage of BigQuery: If you prefer to continue using Hive for most workloads but want to leverage specific features of BigQuery, this use case allows for a unified environment. The Connector enables Hive to join its own tables with those managed by BigQuery, allowing selective usage of BigQuery for specific workloads that can benefit from its features, such as BI Engine or BigQuery ML.
Full open-source software (OSS) stack: For those who want to maintain a full OSS stack for their data warehouse, the Connector supports the migration of data in its original OSS format (e.g., Avro, Parquet, or ORC) to Cloud Storage. Hive can continue to execute and process queries using its own SQL dialect, while the Connector enhances the OSS stack by utilizing BigLake and BigQuery features, such as metadata caching for query performance, Data Loss Prevention, column-level access control, and dynamic data masking for enhanced security and governance at scale.

Hive-BigQuery Connector Features

The Hive-BigQuery Connector, in its public preview release, offers several features, including:

Support for running queries with MapReduce and Tez execution engines
Creation and deletion of BigQuery tables from Hive
Joining BigQuery and BigLake tables with Hive tables
Fast reads from BigQuery tables using the Storage Read API streams and the Apache Arrow format
Two methods for writing data to BigQuery: direct writes using the BigQuery Storage Write API for low-latency workloads and indirect writes by staging temporary Avro files in Cloud Storage, then loading them into the destination table using the Load Job API for cost-efficient workloads
Access to BigQuery time-partitioned and clustered tables
Column pruning to retrieve only necessary columns from the data layer
Predicate pushdowns to pre-filter data rows at the BigQuery storage layer, improving query performance by reducing network data transfer
Automatic conversion of Hive data types to BigQuery data types

The Hive-BigQuery Connector has already proven its value in real-world scenarios, such as Flipkart’s data lake migration to Google Cloud. The flexibility provided by the connector allows queries on BigQuery data from Hive, providing the necessary interoperability while eliminating data duplication or silos across various data stores.

With the Hive-BigQuery Connector, users can seamlessly integrate Hive workloads with BigQuery and BigLake tables, enabling migration, coexistence, and interaction between the two systems. This open-source solution offers valuable use cases, benefits from cloud data warehousing, and enhances the capabilities of Apache Hive in the modern cloud era.

for more details click on Google Cloud

11 COMMENTS

Identify Vehicle Damage With Machine Learning July 8, 2023 At 8:15 pm

[…] the interpretability of the image classification, the team employed Explainable AI, a feature of Vertex AI. Explainable AI generates feature attributions or importance values that illustrate how much each […]

Reply
Success With Bigtable Observability % August 6, 2023 At 10:01 am

[…] The capacity to comprehend the state of a database, including its performance, health, and security, is referred to as observability. Observability is essential in any database, but it is extremely important when working at scale with a database service like Cloud Bigtable. […]

Reply
Collaboration Benefits With Google Assured Workloads For DoD August 12, 2023 At 3:01 am

[…] IL5 since Google Cloud’s 2022 announcement on DoD IL5 workloads and our pledge to rapidly grow Google Cloud services with IL5 permission. We will keep expanding those services and provide updates as more become […]

Reply
Google Cloud's Network Of Observability Partners Ecosystem August 17, 2023 At 6:05 pm

[…] With Kentik, Google Cloud customers can reference live, always-up-to-date visualisations of Google Cloud infrastructure topology along with private-cloud and intra-cloud connectivity; create custom […]

Reply
Google Cloud Innovators And Community Help You Achieve August 30, 2023 At 10:51 am

[…] will now have a verified status symbol added to your Google Cloud community profile if you have been designated as an Innovator. This will help you stand out from […]

Reply
Google Cloud Powered Core Banking Innovation: Mambu September 7, 2023 At 3:05 pm

[…] Native BigQuery integration: The platform’s Mambu users gather a ton of data that can be used for analytics, customization, and other use cases. In order to help customers make better use of their priceless data, we intend to develop a smooth integration for transferring core banking data from Mambu into BigQuery. […]

Reply
Exploring Google Cloud AI Features September 8, 2023 At 9:39 am

[…] solve data quality and observability issues. Dataproc, Pub/Sub, Google Kubernetes Engine (GKE), and BigQuery were the most cost-effective and performant managed and serverless components, so the company chose […]

Reply
BigQuery Blockchain Support By Google Cloud Edge September 22, 2023 At 11:49 am

[…] Through Google BigQuery public datasets, which we extended with six more in 2019, Google Cloud collaborated with the community to democratize blockchain data at the beginning of 2018. Eleven more of the most popular blockchains have been added to the BigQuery public datasets today. Additionally, we’re updating the program’s already-existing datasets. […]

Reply
Machine Learning Accelerator: Adding Edge AI To Looker September 23, 2023 At 11:17 am

[…] this model type to Looker customers since it is one of the most popular model types available in BigQuery ML. You need just pick a Looker Explore, decide on any measure, and then choose the temporal […]

Reply
Advanced JSON Analytics With BigQuery SQL % September 25, 2023 At 12:26 pm

[…] are thrilled to announce the availability of additional SQL functions for BigQuery JSON today, enhancing the flexibility and power of our foundational JSON support. Complex data […]

Reply
Dominate Structured And Semi-Structured Data Explosion November 17, 2023 At 12:30 pm

[…] post will discuss BigQuery‘s architectural concepts forsemi-structured data JSON, which eliminates complex preprocessing […]

Reply

Google Cloud Introducing the Hive-BigQuery Connector

What is the Hive-BigQuery Connector?

Benefits of Cloud Data Warehouse Migration

BigQuery Migration Service

Use Cases for the Hive-BigQuery Connector

The Hive-BigQuery Connector caters to various core use cases, including:

Hive-BigQuery Connector Features

Steam Deck SSD Upgrade Guide: Everything You Need to Know

Nvidia GeForce RTX 4070 Ti SUPER With 26 GBPs Memory

Expected One UI 7 Release Date, Supported Devices & Features

11 COMMENTS

LEAVE A REPLY Cancel reply

Recent Posts

Steam Deck SSD Upgrade Guide: Everything You Need to Know

Nvidia GeForce RTX 4070 Ti SUPER With 26 GBPs Memory

Expected One UI 7 Release Date, Supported Devices & Features

M4 Nod Delay for Apple Mac Studio and Mac Pro Until Mid-2025

Atom Computing is Ushering in a New Era of Quantum Research

Genio 700: A Technical Breakdown of the Next-Gen IoT Chipset

Popular Post

Cardea Z540 SSD Revolutionizes Storage

ASUS ProArt PA602 The Most Elegant Computer Case!

What is Azure Policy in Microsoft Azure

Qualcomm Cloud AI 100 Ultra Launch : Latest Tech Wonder!

Boost Your Apps Now: Amazon ElastiCache Serverless Unveiled!

MSI Motherboards with Intel Application Optimization

About Us

POPULAR CATEGORY