Companies are producing data at an exponential rate, including unstructured data like photos and audio as well as standard structured transactional data, semi-structured data like JSON. In addition to the size of the data, these disparate categories pose processing difficulties for developers and occasionally call for unique processing flows for each. BigQuery’s first support for semi-structured JSON did away with the need for such intricate preparation while also offering flexible schema, simple querying, and the scalability advantages of structured data.
We are thrilled to announce the availability of additional SQL functions for BigQuery JSON today, enhancing the flexibility and power of our foundational JSON support. Complex data processing tasks and the construction of JSON data are both made easier by these services.
With these new query functions, you can
• Use the new JSON LAX functions to convert JSON values into primitive types (INT64, FLOAT64, BOOL, and STRING) in an easier and more flexible manner.
• BigQuery’s new JSON mutator methods make it simple to edit and change an existing JSON value.
• Using the new JSON constructor functions in BigQuery, create JSON objects and JSON arrays.
Let’s go over these new features and some usage examples. To start, we’ll make an example table.
CREATE TABLE dataset_name.users_sample AS (
SELECT JSON ‘{“name”: “Alice”, “age”: 28, “address”: {“country”: “USA”, “city”: “SF”, “zipcode”: 94102}}’ AS user UNION ALL
SELECT JSON ‘{“name”: “Bob”, “age”: “40”, “address”: {“country”: “Germany”}}’ UNION ALL
SELECT JSON ‘{“name”: “Charlie”, “age”: null, “address”: {“zipcode”: 12356, “country”: null}}’
)
Query:
— Table contents
SELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);
Output:
+———————————————————————————–+
| user |
+———————————————————————————–+
| {“address”:{“city”:”SF”,”country”:”USA”,”zipcode”:94102},”age”:28,”name”:”Alice”} |
| {“address”:{“country”:”Germany”},”age”:”40″,”name”:”Bob”} |
| {“address”:{“country”:null,”zipcode”:12356},”age”:null,”name”:”Charlie”} |
+———————————————————————————–+
Great! Say we want a list of all users who are over 30. You can notice that user if you look at the table.The first record of age is a JSON number, the second is a JSON text, and the third is a JSON null. All types are automatically deduced and processed correctly by the new, powerful LAX function LAX_INT64.
The strict conversion function below, for instance, would result in the following error:
Query:
SELECT user.name FROM dataset_name.users_sample
WHERE LAX_INT64(user.age) > 30
Output:
+——-+
| name |
+——-+
| “Bob” |
+——-+
The lax conversion function, however, would produce the intended outcome:
query:
SELECT INT64(JSON ‘”10″‘) AS strict_int64
Output:
Error: The provided JSON input is not an integer
The JSON_REMOVE methods can also be used to swiftly and simply delete a field from the JSON data.
Query:
SELECT LAX_INT64(JSON ‘”10″‘) AS lax_int64
Output:
+———–+
| lax_int64 |
+———–+
| 10 |
+———–+
JSON nulls are removed during compression with JSON_STRIP_NULLS. BigQuery null values can be useful for lowering data size during exports even if they have no effect on performance or storage costs.
Query:
UPDATE dataset_name.users_sample SET user = JSON_STRIP_NULLS(user, remove_empty=>true) WHERE true
After the query above; if you run the query “SELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);”, you will receive the following Output:
+——————————————————————-+
| user |
+——————————————————————-+
| {“address”:{“city”:”SF”,”country”:”USA”},”age”:28,”name”:”Alice”} |
| {“address”:{“country”:”Germany”},”age”:”40″,”name”:”Bob”} |
| {“name”:”Charlie”} |
+——————————————————————-+
What happens, therefore, if we want to change or add a field to the JSON data?
With the newly introduced JSON_SET function, the data can now be updated. Additionally, you can combine several JSON functions to produce the outcomes you want.
For instance, the query below expands the table’s “region_code” field. If the value of the “country” field is “USA,” then the value of the field will be “America,” else it will be “Other.”
— Updating/Adding a field is easy to do as well. The structure will be automatically created (see “Charlie” row)
Query:
UPDATE dataset_name.users_sample SET user = JSON_SET(user, “$.address.region_code”, IF(LAX_STRING(user.address.country) = “USA”, “America”, “Other”)) WHERE true
After the query above; if you run the query “SELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);”, you will receive the following Output:
+——————————————————————————————-+
| user |
+——————————————————————————————-+
| {“address”:{“city”:”SF”,”country”:”USA”,”region_code”:”America”},”age”:28,”name”:”Alice”} |
| {“address”:{“country”:”Germany”,”region_code”:”Other”},”age”:”40″,”name”:”Bob”} |
| {“address”:{“region_code”:”Other”},”name”:”Charlie”} |
+——————————————————————————————-+
Last but not least, suppose you wish to create a JSON object from a table of property/value pairs. The new JSON object can be easily created using the new JSON_OBJECT constructor method.
Query:
WITH Fruits AS (
SELECT 0 AS id, ‘color’ AS k, ‘Red’ AS v UNION ALL
SELECT 0, ‘fruit’, ‘apple’ UNION ALL
SELECT 1, ‘fruit’,’banana’ UNION ALL
SELECT 1, ‘ripe’, ‘true’
)
SELECT JSON_OBJECT(ARRAY_AGG(k), ARRAY_AGG(v)) AS json_data
FROM Fruits
GROUP BY id
Output:
+———————————-+
| json_data |
+———————————-+
| {“color”:”Red”,”fruit”:”apple”} |
| {“fruit”:”banana”,”ripe”:”true”} |
+———————————-+
Full list of operations
The following are the LAX conversion functions
The following are the JSON constructor functions
The following are theJSON mutator functions
Give it a go!
To make data analysis more simple and effective, Google BigQuery is constantly adding new functionality. We invite you to take a look at them and give us your comments as we continue to build more features and functionalities to make working with JSON faster and simpler.
[…] with popular development frameworks speed up application development and deployment. REST APIs, JSON support, and SQL compatibility help developers build cloud-native apps. Db2 v11.5 is reliable, flexible, […]
[…] data at the beginning of 2018. Eleven more of the most popular blockchains have been added to the BigQuery public datasets today. Additionally, we’re updating the program’s already-existing […]
[…] Predictions in Spanner based only on SQL and Vertex AI […]
[…] much anticipated GPT-4 Turbo is one of them, it has more prompt length, reduced cost, and organized JSON formatting with better control and efficiency. In keeping with their regular practice of delivering […]
[…] you to import key-value pairs into an Amazon Simple Storage Service (Amazon S3) bucket from a JSON file. If you want to start with no keys, therefore you not doing that right now. You finish the key […]