Select Page
FacebooktwitterlinkedinyoutubeFacebooktwitterlinkedinyoutube

Software-as-a-service (SaaS) platforms can have challenges scaling a multi-tenant environment when managing data. To isolate a customer’s data, a distinct database per customer is created, and the application simply accesses the proper database as needed for a given customer.

There are limits a single database can handle, however. For light workloads, this may be in the thousands, and for heavier workloads it may be as low as tens of thousands per database.

One solution is to modify the application to look up customer data in the database cluster.  Unfortunately, this places the large burden of development and maintenance from the application team. Another way is to use a database proxy to route the queries automatically to the proper cluster, vastly simplifying the process.

For third-party software, modifying the application is typically not possible. Implementing a proxy would be the only solution.

In this post, we will explore how the Heimdall Database Proxy can be configured to provide customer query routing for Amazon Relational Database Service (Amazon RDS) and Amazon Redshift customers, with examples for both Postgres-based databases and Redshift.

Solution Implementation

Figure 1 – Topology of customer query routing with the Heimdall Database Proxy

This guide shows a single Heimdall Proxy endpoint to access data for multiple customers (tenants) from multiple databases, transparently.

Requirements:

  • Multiple database clusters, each with its own non-overlapping databases and/or users,
  • One or more updated Heimdall Proxy instances,
  • Virtual private cloud (VPC) routing configured so the proxies can access the databases and that application can access proxies.

Step 1: Configure a Baseline Cluster Configuration

In the Heimdall Central Manager, there’s an option to configure the system using the Configuration Wizard. For an Amazon RDS environment, the “AWS Detect” pre-populates many of the configuration fields. For Amazon Redshift, select “Manual Configuration.”

The Data Source tab should be configured as per Figure 2. For Redshift users, configure as needed.

Figure 2:  Example of a Postgres Data Source Configuration

Note in Figure 2 the use of the variable placeholder ${host which maps to a host field stored in a database table; we’ll explain more below.

The default database specified should be the one being used to control the routing of queries; we will be adding a table to this database to provide the customer routing information. Complete the Configuration Wizard using the defaults, as appropriate. 

Step 2:  Modify Data Source with the defaultHost Option

After the Configuration Wizard is complete, in the Data Source tab, under Connection Properties, configure the “defaultHost” option with the hostname (excluding domain) of the “primary” database instance as per Figure 3. This instance will be used to store authentication information for all of the users, and will be the source of routing information.

Figure 3 – Modified Postgres data source with defaultHost set

Step 3: Configure an Authentication Table on the Database

For routing, configure SQL authentication on the proxy. This will configure the proxy to a per-user or database basis by adjusting the data returned from a SQL query. 

The first step is to build the authentication table. In order to conform to general conventions, the Heimdall Proxy uses a table that generally matches the Postgres hba.conf file, although not all columns are needed. For our example, we show two formats: Amazon Redshift and Postgres.

Please note that Redshift doesn’t provide array variables. Instead, we’ll use a simple text field that emulates an array; a username would be stored as “{user1}” or “{user1,user2}”. The Heimdall Proxy parses this into an array and the table definition for Amazon Redshift will be:

CREATE TABLE IF NOT EXISTS heimdall.pg_hba

(

     line_number INTEGER,

enabled BOOLEAN NOT NULL DEFAULT true,

     address VARCHAR(256),

     netmask VARCHAR(256),

     “database” VARCHAR(256),

     user_name VARCHAR(256),

     “host” VARCHAR(256),

     auth_method VARCHAR(256) NOT NULL DEFAULT ‘trust’::character varying

)

This table also supports IP network based controls, if desired.

Next, for Postgres, we will use:

CREATE TABLE public.pg_hba (

     line_number int4 NULL,

     enabled bool NOT NULL DEFAULT true,

     “database” _text NULL,

     host text NULL,

     auth_method varchar NULL DEFAULT ‘trust’::character varying

);

In this case, we have simplified the table to the minimum.

Now, we can add in data for the routing. For Amazon Redshift:

line_numberenabledaddressnetmaskdatabaseuser_namehostauth_ method
5true10.0.0.0255.0.0.0{all}{test1}test1trust
10true0.0.0.00.0.0.0{dev}{test2}test2trust

For Postgres:

line_numberenableddatabasehostauth_method
10true{database1}postgres1trust
20true{database2}postgres2trust
30true{database3}postgres3trust

Step 4:  Configure SQL Authentication

In the VDB tab, under proxy settings, configure the SQL authentication parameter, providing a privileged user that will be used to access the authentication table (to be created), as per Figure 4 or 5. 

Figure 4 – SQL authentication with routing via username
Figure 5 – SQL authentication with routing only by database on Postgres

For username based routing, the authentication query could be:

select * from heimdall.pg_hba where enabled = true and user_name like ‘%${user}%’ order by line_number asc

For database based routing on Postgres, the query can be:

select * from pg_hba where enabled = true and ‘${database}’ = ANY(database) order by line_number asc

These queries support both array values and non-array values for the user and database column, as Postgres can use an array value to match multiple names in a row, but Amazon Redshift cannot.

Step 5: Testing

Now, everything should be configured and we can start verifying the credentials via the authentication test GUI under the VDB authentication settings.

First, use a valid IP address range in our Amazon Redshift example:

Figure 6: Authentication test passing for user test1 from IP 10.0.10.1

Now, test with an invalid IP address (IP address not within the 10.x.x.x range):

Figure 7 – Authentication test failing due to the test IP being out of range

If you connect using both configured users via the proxy, you can do a “show pools” command to show which connection pools are mapped to which servers:

$ psql –host=127.0.0.1 –port=5439 –user=test2 dev

Password for user test2: …

dev=> show pools;

URL or catalog:user     | busy | connecting | idle | wait count | created |

 test1..:5439/              | 0/0  | 0          | 2    | 0          | 2       |

    NA:test1                | 0/0  | 0          | 1/0  | 0          | 1       |

    dev:test1               | 0/0  | 0          | 1/0  | 0           | 1       |

 test2.…:5439/              | 1/0  | 0          | 0    | 0          | 1       |

    dev:test2               | 1/0  | 0          | 0/0  |. 0          | 1       |

(5 rows)

Here, we can see that the “test1” user was routed to the “test1” server, while the “test2” user was routed to “test2”.

In the Postgres example, we used DBeaver, a frequently used SQL management tool. Here, we have eight different connections configured in DBeaver:

Figure 8 – Example DBeaver connection configurations for testing

On the Postgres 1-3 databases, each has one database, named database1 to database 3. If you connect directly, you find the proper database:

Figure 9 – Direct connection to postgres1 database.

Likewise, if you connect to the heimdall-database1 connection, it will route you to the backend database postgres1.

Figure 10 – Configuration for the heimdall-database1 connection.
Figure 11 – Connecting via the proxy, routed appropriate for access to database1

If you connect to database2, you get routed to the postgres2 server.

Figure 12 – Routing to database2 via the proxy

Conclusion

Heimdall Data provides functionality needed to support many complex database environments and solves many of the challenges in scaling database access.

In this post, we have shown how to horizontally scale a SaaS environment, but we also assist in horizontally scaling Amazon RDS with no code changes required. Contact Heimdall Data to find a solution to your challenges, and download a free trial from AWS Marketplace.

Resources



FacebooktwitterlinkedinyoutubeFacebooktwitterlinkedinyoutube