Thursday, November 21, 2024

Advanced JSON Analytics with BigQuery SQL

- Advertisement -

Companies are producing data at an exponential rate, including unstructured data like photos and audio as well as standard structured transactional data, semi-structured data like JSON. In addition to the size of the data, these disparate categories pose processing difficulties for developers and occasionally call for unique processing flows for each. BigQuery’s first support for semi-structured JSON did away with the need for such intricate preparation while also offering flexible schema, simple querying, and the scalability advantages of structured data.

We are thrilled to announce the availability of additional SQL functions for BigQuery JSON today, enhancing the flexibility and power of our foundational JSON support. Complex data processing tasks and the construction of JSON data are both made easier by these services.

- Advertisement -

With these new query functions, you can

• Use the new JSON LAX functions to convert JSON values into primitive types (INT64, FLOAT64, BOOL, and STRING) in an easier and more flexible manner.

• BigQuery’s new JSON mutator methods make it simple to edit and change an existing JSON value.

• Using the new JSON constructor functions in BigQuery, create JSON objects and JSON arrays.

Let’s go over these new features and some usage examples. To start, we’ll make an example table.

- Advertisement -

CREATE TABLE dataset_name.users_sample AS (
SELECT JSON ‘{“name”: “Alice”, “age”: 28, “address”: {“country”: “USA”, “city”: “SF”, “zipcode”: 94102}}’ AS user UNION ALL
SELECT JSON ‘{“name”: “Bob”, “age”: “40”, “address”: {“country”: “Germany”}}’ UNION ALL
SELECT JSON ‘{“name”: “Charlie”, “age”: null, “address”: {“zipcode”: 12356, “country”: null}}’
)

Query:

— Table contents
SELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);
Output:
+———————————————————————————–+
| user |
+———————————————————————————–+
| {“address”:{“city”:”SF”,”country”:”USA”,”zipcode”:94102},”age”:28,”name”:”Alice”} |
| {“address”:{“country”:”Germany”},”age”:”40″,”name”:”Bob”} |
| {“address”:{“country”:null,”zipcode”:12356},”age”:null,”name”:”Charlie”} |
+———————————————————————————–+

Great! Say we want a list of all users who are over 30. You can notice that user if you look at the table.The first record of age is a JSON number, the second is a JSON text, and the third is a JSON null. All types are automatically deduced and processed correctly by the new, powerful LAX function LAX_INT64.

The strict conversion function below, for instance, would result in the following error:

Query:

SELECT user.name FROM dataset_name.users_sample
WHERE LAX_INT64(user.age) > 30
Output:
+——-+
| name |
+——-+
| “Bob” |
+——-+

The lax conversion function, however, would produce the intended outcome:

query:

SELECT INT64(JSON ‘”10″‘) AS strict_int64
Output:
Error: The provided JSON input is not an integer

The JSON_REMOVE methods can also be used to swiftly and simply delete a field from the JSON data.

Query:

SELECT LAX_INT64(JSON ‘”10″‘) AS lax_int64
Output:
+———–+
| lax_int64 |
+———–+
| 10 |
+———–+

JSON nulls are removed during compression with JSON_STRIP_NULLS. BigQuery null values can be useful for lowering data size during exports even if they have no effect on performance or storage costs.

Query:

UPDATE dataset_name.users_sample SET user = JSON_STRIP_NULLS(user, remove_empty=>true) WHERE true
After the query above; if you run the query “SELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);”, you will receive the following Output:

+——————————————————————-+
| user |
+——————————————————————-+
| {“address”:{“city”:”SF”,”country”:”USA”},”age”:28,”name”:”Alice”} |
| {“address”:{“country”:”Germany”},”age”:”40″,”name”:”Bob”} |
| {“name”:”Charlie”} |
+——————————————————————-+

What happens, therefore, if we want to change or add a field to the JSON data?

With the newly introduced JSON_SET function, the data can now be updated. Additionally, you can combine several JSON functions to produce the outcomes you want.

For instance, the query below expands the table’s “region_code” field. If the value of the “country” field is “USA,” then the value of the field will be “America,” else it will be “Other.”

— Updating/Adding a field is easy to do as well. The structure will be automatically created (see “Charlie” row)

Query:

UPDATE dataset_name.users_sample SET user = JSON_SET(user, “$.address.region_code”, IF(LAX_STRING(user.address.country) = “USA”, “America”, “Other”)) WHERE true
After the query above; if you run the query “SELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);”, you will receive the following Output:

+——————————————————————————————-+
| user |
+——————————————————————————————-+
| {“address”:{“city”:”SF”,”country”:”USA”,”region_code”:”America”},”age”:28,”name”:”Alice”} |
| {“address”:{“country”:”Germany”,”region_code”:”Other”},”age”:”40″,”name”:”Bob”} |
| {“address”:{“region_code”:”Other”},”name”:”Charlie”} |
+——————————————————————————————-+

Last but not least, suppose you wish to create a JSON object from a table of property/value pairs. The new JSON object can be easily created using the new JSON_OBJECT constructor method.

Query:

WITH Fruits AS (
SELECT 0 AS id, ‘color’ AS k, ‘Red’ AS v UNION ALL
SELECT 0, ‘fruit’, ‘apple’ UNION ALL
SELECT 1, ‘fruit’,’banana’ UNION ALL
SELECT 1, ‘ripe’, ‘true’
)
SELECT JSON_OBJECT(ARRAY_AGG(k), ARRAY_AGG(v)) AS json_data
FROM Fruits
GROUP BY id
Output:
+———————————-+
| json_data |
+———————————-+
| {“color”:”Red”,”fruit”:”apple”} |
| {“fruit”:”banana”,”ripe”:”true”} |
+———————————-+

Full list of operations

The following are the LAX conversion functions

The following are the JSON constructor functions

The following are theJSON mutator functions

Give it a go!

To make data analysis more simple and effective, Google BigQuery is constantly adding new functionality. We invite you to take a look at them and give us your comments as we continue to build more features and functionalities to make working with JSON faster and simpler.

- Advertisement -
agarapuramesh
agarapurameshhttps://govindhtech.com
Agarapu Ramesh was founder of the Govindhtech and Computer Hardware enthusiast. He interested in writing Technews articles. Working as an Editor of Govindhtech for one Year and previously working as a Computer Assembling Technician in G Traders from 2018 in India. His Education Qualification MSc.
RELATED ARTICLES

5 COMMENTS

Recent Posts

Popular Post

Govindhtech.com Would you like to receive notifications on latest updates? No Yes