Companies deploy Amazon Redshift for large-scale MPP (Massively Parallel Processing) data analytics. These systems apply parallel compute resources to answer queries quickly. However, there are still application performance challenges:
- Poor application interaction with Redshift: Singleton DML operations result in slow application response times
- Network latency: Inefficient application access to Redshift (e.g. duplicate and frequent queries)
These challenges can be solved by modifying the application. However, the ideal solution requires no code changes. Heimdall Data is that solution providing Amazon Redshift users offering:
- Batch singleton, DML operations (i.e. INSERT)
- Client-side, SQL results caching
- Fast materialized views for results for Amazon Redshift
- Connection Pooling
- Authentication, Authorization for Active Directory / LDAP
Heimdall Data improves database write performance by transparently batching INSERTs against a table under a single transaction. Batching DML operations result in:
- Improved application response times due to fewer commits
- Improved DML scale
The ideal use case for batch processing:
Insert/update/delete a large amount of data at once on a thread. Heimdall can process it all at once much faster than if individual queries outside of a transaction were completed.
Not so ideal use case for batch processing:
If there are concurrent writes and reads against the same table, on the same thread, as everything will just block until the DML operation is completed anyway.
If an exception occurs in the transaction, Heimdall determines which query resulted in the transaction, and will remove the query from the list, report the exception in our logs, and reprocess the batch without the query in place.
Fast materialized views for results for Amazon Redshift
Fast materialized views are very important in analytics environments. When reports are generated, a subset of data is pulled from the back-end data store, then various operations are performed on that data. Heimdall provides the following functionality:
- Queries against a materialized view can be routed to an alternate database, typically Postgres, which acts on behalf of Amazon Redshift. Postgres answers queries offloading Amazon Redshift.
- Heimdall triggers a refresh of the view automatically. Heimdall is aware of updated views from Amazon Redshift and when data was loaded that may impact the view. The net result is faster reports and a lighter load on Redshift, allowing the processing of other queries to be faster and more scalable.
Automated SQL Caching for Amazon ElastiCache
Heimdall’s intelligent auto-caching and auto-invalidation work together with Amazon Redshift’s query caching, but in the application tier, removing network latency. This distributed architecture allows caching to be scalable while bringing the data a hop closer to the user. Result sets are cached in tandem from local memory and Amazon ElastiCache and are invalidated upon writes to Redshift. Best of all, zero code changes are required. For more details, check out or AWS auto-caching blog.
Connection Pooling & Multiplexing
Like Heimdall auto-caching, connection pooling and multiplexing can save you Redshift costs by improving processing power.
It is important to maintain the correct ratio between application and database resources. Each connection to Redshift takes valuable resources that can be better utilized for active queries. Therefore, it is desirable to reduce the total number of connections to Redshift.
There are two Heimdall techniques that reduce connection overhead:
- Heimdall uniquely supports true multi-user connection pooling, with both per-user and global connection limits, to manage the number of active connections on Postgres. Connection pooling allows multiple client connections to be associated to a Redshift connection. When a connection is established from a client, an existing connection is picked from the existing pool, instead of a new connection being established. While the client connection is open, this mapping remains. When the client-side connection is closed, the Redshift connection is kept open to reduce CPU and latency for backend connection establishment lowering Redshift overhead.
- Connection multiplexing is an extension of pooling. Instead of associating an entire client connection with the back-end, connection multiplexing dispatches individual queries or transactions to connections in the connection pool. As client connections are often idle, multiplexing allows for more “active” client connections to Redshift. The net result is 1) Lower total memory overhead on Redshift, and 2) More active queries processed at one time, and 3) Reduced Redshift costs. Benefits will vary based on the application workload.
Authentication, Authorization, and Auditing
Most Redshift deployments support a limited number of users to the database. Users are typically managed manually by the DBA. The Heimdall proxy supports
1) Authentication, authorization and auditing of Redshft access:
a) Authentication of the user via Active Directory/LDAP;
b) Group extraction and role synchronization from LDAP into Postgres for authorization;
c) Logging of queries for auditing.
The Heimdall architecture was designed for ease of deployment without the need to modify the application or database. Configuration changes are updated at runtime without restarting the application. The net result is a platform that can be updated with 100% application uptime.
Heimdall Data is an AWS Technology partner providing SQL visibility and performance optimization as a Database Proxy. Available in the AWS Marketplace, Heimdall is transparently integrated with Amazon Redshift, Aurora, RDS, ElastiCache and CloudWatch.
For more information about how Heimdall Data can help improve your AWS environment, contact us at firstname.lastname@example.org.