Aurora PostgreSQL Limitless Database: Unlimited Data Growth

Aurora PostgreSQL Limitless Database

The new serverless horizontal scaling (sharding) feature of Aurora PostgreSQL Limitless Database, is now generally available.

With Aurora PostgreSQL Limitless Database, you may distribute a database workload across several Aurora writer instances while still being able to utilize it as a single database, allowing you to extend beyond the current Aurora restrictions for write throughput and storage.

During the AWS re:Invent 2023 preview of Aurora PostgreSQL Limitless Database, to described how it employs a two-layer architecture made up of several database nodes in a DB shard group, which can be either routers or shards to grow according to the demand.

Limitless Database Shard Group — Image Credit To AWS

Routers: Nodes known as routers receive SQL connections from clients, transmit SQL commands to shards, keep the system consistent, and provide clients with the results.

Shards: Routers can query nodes that hold a fraction of tables and complete copies of data.

Your data will be listed in three different table types: sharded, reference, and standard.

Sharded tables: These tables are dispersed among several shards. Based on the values of specific table columns known as shard keys, data is divided among the shards. They are helpful for scaling your application’s biggest, most I/O-intensive tables.

Reference tables: These tables eliminate needless data travel by copying all of the data on each shard, allowing join queries to operate more quickly. For reference data that is rarely altered, such product catalogs and zip codes, they are widely utilized.

Standard tables: These are comparable to standard PostgreSQL tables in Aurora. To speed up join queries by removing needless data travel, standard tables are grouped together on a single shard. From normal tables, sharded and reference tables can be produced.

Massive volumes of data can be loaded into the Aurora PostgreSQL Limitless Database and queried using conventional PostgreSQL queries after the DB shard group and your sharded and reference tables have been formed.

Getting started with the Aurora PostgreSQL Limitless Database

An Aurora PostgreSQL Limitless Database DB cluster can be created, a DB shard group added to the cluster, and your data queried via the AWS Management Console and AWS Command Line Interface (AWS CLI).

Establish a Cluster of Aurora PostgreSQL Limitless Databases

Choose Create database when the Amazon Relational Database Service (Amazon RDS) console is open. Select Aurora PostgreSQL with Limitless Database (Compatible with PostgreSQL 16.4) and Aurora (PostgreSQL Compatible) from the engine choices.

Enter a name for your DB shard group and the minimum and maximum capacity values for all routers and shards as determined by Aurora Capacity Units (ACUs) for the Aurora PostgreSQL Limitless Database. This maximum capacity determines how many routers and shards are initially present in a DB shard group. A node’s capacity is increased by Aurora PostgreSQL Limitless Database when its present utilization is insufficient to manage the load. When the node’s capacity is greater than what is required, it reduces it to a lower level.

Aurora PostgreSQL Limitless Database — Image Credit To AWS

There are three options for DB shard group deployment: no compute redundancy, one compute standby in a different Availability Zone, or two compute standbys in one Availability Zone.

You can select Create database and adjust the remaining DB parameters as you see fit. The DB shard group appears on the Databases page when it has been formed.

In addition to changing the capacity, splitting a shard, or adding a router, you can connect, restart, or remove a DB shard group.

Construct Limitless Database tables in Aurora PostgreSQL

As previously mentioned, the Aurora PostgreSQL Limitless Database contains three different types of tables: standard, reference, and sharded. You can make new sharded and reference tables or convert existing standard tables to sharded or reference tables for distribution or replication.

By specifying the table construction mode, you can use variables to create reference and sharded tables. Until a new mode is chosen, the tables you create will use this mode. The examples that follow demonstrate how to create reference and sharded tables using these variables.

For instance, make a sharded table called items and use the item_id and item_cat columns to build a shard key.

SET rds_aurora.limitless_create_table_mode='sharded';
SET rds_aurora.limitless_create_table_shard_key='{"item_id", "item_cat"}';
CREATE TABLE items(item_id int, item_cat varchar, val int, item text);

Next, construct a sharded table called item_description and collocate it with the items table. The shard key should be made out of the item_id and item_cat columns.

SET rds_aurora.limitless_create_table_collocate_with='items';
CREATE TABLE item_description(item_id int, item_cat varchar, color_id int, ...);

Using the rds_aurora.limitless_tables view, you may obtain information on Limitless Database tables, including how they are classified.

SET rds_aurora.limitless_create_table_mode='reference';
CREATE TABLE colors(color_id int primary key, color varchar);

It is possible to transform normal tables into reference or sharded tables. The source standard table is removed after the data has been transferred from the standard table to the distributed table during the conversion. For additional information, see the Amazon Aurora User Guide’s Converting Standard Tables to Limitless Tables section.

postgres_limitless=> SELECT * FROM rds_aurora.limitless_tables;

 table_gid | local_oid | schema_name | table_name  | table_status | table_type  | distribution_key
-----------+-----------+-------------+-------------+--------------+-------------+------------------
         1 |     18797 | public      | items       | active       | sharded     | HASH (item_id, item_cat)
         2 |     18641 | public      | colors      | active       | reference   | 

(2 rows)

Run queries on tables in the Aurora PostgreSQL Limitless Database

The Aurora PostgreSQL Limitless Database supports PostgreSQL query syntax. With PostgreSQL, you can use psql or any other connection tool to query your limitless database. You can use the COPY command or the data loading program to import data into Aurora Limitless Database tables prior to querying them.

Connect to the cluster endpoint, as indicated in Connecting to your Aurora Limitless Database DB cluster, in order to execute queries. The router to which the client submits the query and shards where the data is stored is where all PostgreSQL SELECT queries are executed.

Two querying techniques are used by Aurora PostgreSQL Limitless Database to accomplish a high degree of parallel processing:

Single-shard queries and distributed queries. The database identifies whether your query is single-shard or distributed and handles it appropriately.

Single-shard queries: All of the data required for the query is stored on a single shard in a single-shard query. One shard can handle the entire process, including any created result set. The router’s query planner forwards the complete SQL query to the appropriate shard when it comes across a query such as this.

Distributed query: A query that is executed over many shards and a router. One of the routers gets the request. The distributed transaction, which is transmitted to the participating shards, is created and managed by the router. With the router-provided context, the shards generate a local transaction and execute the query.

To configure the output from the EXPLAIN command for single-shard query examples, use the following arguments.

postgres_limitless=> SET rds_aurora.limitless_explain_options = shard_plans, single_shard_optimization;
SET

postgres_limitless=> EXPLAIN SELECT * FROM items WHERE item_id = 25;

                     QUERY PLAN
--------------------------------------------------------------
 Foreign Scan  (cost=100.00..101.00 rows=100 width=0)
   Remote Plans from Shard postgres_s4:
         Index Scan using items_ts00287_id_idx on items_ts00287 items_fs00003  (cost=0.14..8.16 rows=1 width=15)
           Index Cond: (id = 25)
 Single Shard Optimized
(5 rows)

You can add additional items with the names Book and Pen to the items table to demonstrate distributed queries.

postgres_limitless=> INSERT INTO items(item_name)VALUES ('Book'),('Pen')

A distributed transaction on two shards is created as a result. The router passes the statement to the shards that possess Book and Pen after setting a snapshot time during the query execution. The client receives the outcome of the router’s coordination of an atomic commit across both shards.

The Aurora PostgreSQL Limitless Database has a function called distributed query tracing that may be used to track and correlate queries in PostgreSQL logs.

Important information

A few things you should be aware of about this functionality are as follows:

Compute: A DB shard group’s maximum capacity can be specified between 16 and 6144 ACUs, and each DB cluster can only have one DB shard group. Get in touch with us if you require more than 6144 ACUs. The maximum capacity you provide when creating a DB shard group determines the initial number of routers and shards. When you update a DB shard group’s maximum capacity, the number of routers and shards remains unchanged.

Storage: The only cluster storage configuration that Aurora PostgreSQL Limitless Database offers is Amazon Aurora I/O-Optimized DB. 128 TiB is the maximum capacity of each shard. For the entire DB shard group, reference tables can only be 32 TiB in size.

Monitoring: PostgreSQL’s vacuuming tool can help you free up storage space by cleaning up your data. Aurora PostgreSQL Limitless Database monitoring can be done with Amazon CloudWatch, Amazon CloudWatch Logs, or Performance Insights. For monitoring and diagnostics, you can also utilize the new statistics functions, views, and wait events for the Aurora PostgreSQL Limitless Database.

Available now

PostgreSQL 16.4 works with the AWS Aurora PostgreSQL Limitless Database. These regions are included: Asia Pacific (Hong Kong), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), US East (N. Virginia), US East (Ohio), and US West (Oregon). Try the Amazon RDS console’s Aurora PostgreSQL Limitless Database.

Aurora PostgreSQL Limitless Database: Unlimited Data Growth