A common challenge that AWS customers providing SaaS services face occurs when they scale up the customer count and have to manage each customer’s data. As part of this, they create distinct databases per customer in order to isolate access to each customer’s data, and the application simply accesses the proper database as needed for a given customer. This model works well, but as the number of customers grows, eventually limits on the number of databases supported on a single server will be hit. For light workloads, this may be in the thousands, although for heavy workloads it may be as low as tens per database, such as with a Redshift cluster.
One possible solution is for a customer to code their application to do lookups of where a customer’s data is located in a set of database clusters, and directly connect to the proper cluster as needed. This works well, but requires the application to be coded to handle this, and places the burden of scaling on the application team. Another way is to use a database proxy in order to route the queries automatically to the proper cluster, and have a single endpoint for all customer activity, vastly simplifying the process. In some cases, where third-party software is involved in the offering, modifying the code is also not possible, so the proxy approach would be the only possible approach.
In this blog, we will explore how the Heimdall database proxy can be configured to provide customer routing functionality for RDS and Redshift customers, with examples for both Postgres based databases as well as Redshift. Other databases can be supported as well, but have additional configuration requirements outside of the scope of this blog.
Figure 1: Example topology when using customer routing with the Heimdall Database Proxy
Solution Implementation
To implement this solution, you will need the following:
- Multiple database clusters, each with their own non-overlapping databases and/or users;
- One or more updated Heimdall instances;
- VPC Routing configured so that the Heimdall instances can access the databases and that the customer or application nodes can access Heimdall.
The result of following this guide will be to be able to provide a single Heimdall proxy endpoint to access data for multiple customers that exists on multiple databases, without being aware that multiple clusters exist.
Step 1: Use the Wizard to configure a baseline cluster configuration
In the Heimdall GUI, there is an option to configure the system using a Wizard. In an RDS environment, there is an option to use the AWS Autodetect. This can be used to pre-populate many of the fields for the configuration. For Redshift, this option should not be used, instead the manual configuration mode should be used. Once past the autodetect option (if used), the data source screen will be shown and should be configured as per figure 2 or 3.
Figure 2: Redshift example data source configuration example
Figure 3: Postgres example data source configuration example
Note in the UI the use of the variable placeholder ${host}. This will map to a host field stored in a database table, to be explained later. The default database specified should be the one being used to control the routing of queries, i.e. we will be adding a table to this database to provide the customer routing information. Complete the wizard at this point using the defaults, as appropriate.
Step 2: Modify the data source with the defaultHost option
After the wizard is complete, In the data source tab, under connection properties, you will want to configure the “defaultHost” option with the hostname (excluding domain) of the “primary” database instance as per figure 4 & 5. This instance will be used to store authentication information for all the users, and will be the source of routing information.
Conclusion
In this blog, we showed you how to set up Active Directory authentication WITH authorization for Postgres. The Heimdall Proxy provides synchronization scripts for all Amazon RDS and Redshift instance types. Using Active Directory authentication allows Enterprise organizations to standardize their password and authorization management via a globally available authentication store, reducing management overhead and improving security and auditing capabilities.
Reference Material:
- Automated Query Caching
- Splitting Query Reads and Writes
- Advanced Connection Pooling
- Heimdall Proxy for Amazon Redshift
- Contact: info@heimdalldata.com
About Heimdall Data
Heimdall Data is an AWS Advanced ISV partner. They have Amazon Service Ready designations for Amazon RDS and Amazon Redshift. Heimdall Data offers a database proxy that offloads SQL improving database scale while securing database access. Proxy deployment does not require code changes.