Heimdall Data, a Microsoft Technology partner, offers a database proxy to improve the write-master / read replica scale in the Azure backend without application changes.
Horizontally scaling out your SQL database (e.g. Postgres, MySQL, SQL Server) involves separating the write-master from read-only servers. This allows the write server to perform more intensive operations instead of processing read queries as well. However, writing to one node and reading from another can result in inconsistent data due to synchronization delays.
Application-database response times are important for web applications as users have little patience with slow performing applications. Finding a transparent solution to improve performance while providing up-to-date, accurate information to customers is highly desired. For Azure managed databases, there is a requirement that usernames are in the format of “user@db hostname”. This can cause problems, in particular with Postgres, as @ is not considered a valid character for usernames, and client software may filter or reject the username. Additionally, the same user may be used for both read and write servers when doing manual read/write split, resulting in the improper routing of queries.
Heimdall Data provides a transparent solution at the application tier to intelligently route queries to the most optimal data source resulting in SQL offload and improved response times. This includes 1) Routing queries to the read/write instance while keeping data updated, and 2) Automatically caching SQL results.
In this blog, we will show you how to configure Heimdall Data in Azure to transparently scale out your write master and read replicas for improved application response times. Let’s walk you through each feature.
Better use of read replicas (aka Read/Write splits)
Figure 1: Read/Write split with Heimdall Data and Azure for Redis
The Heimdall proxy is SQL-aware and routes queries to the appropriate write server and read replicas. By leveraging the Azure for Redis publish-subscribe interface, the proxy determines whether to retrieve the SQL results from cache or the backend database. If read query is not deemed cachable, Heimdall knows which read node should be accessed, query-by-query, based on what tables were being read.
There are a variety of tools in the market that claim read/write splitting, but they are not replication lag aware. That is, how do you know when the WRITE has been updated across all database instances? Could you be reading stale data? Other solutions may track lag but are not intelligent enough to determine WHEN it is safe for a particular query to be routed to the read-slave. They are not designed to fully offload the developer from implementing read/write splits.
In Azure, Heimdall has an option to enable username pre-processing, to adjust the username based on the desired target server. This option is the “azureDbHost” setting in the connection settings of the data source. When enabled, the application uses a simple username as normal, but when the connection is made to the server, the server hostname is extracted and appended as @hostname. This allows Azure managed databases to properly function with query routing, even if the application is not aware of this behavior.
Note: Heimdall does not support data replication; that is the job of the database. Heimdall with Azure Cache for Redis, detects when data has completed replication and routes queries to the appropriate database instance.
Automating Read/Write Split with Heimdall Data for Redis
Each query is parsed to determine what tables are associated with it. The last write time to the table is tracked by Heimdall, and synchronized using Azure for Redis. This provides the first piece of information (i.e. when was the last time a particular table was written to). Next, Heimdall performs writes on a scheduled basis to the write node and reads how long it takes for the writes to appear on the read server. With this information, along with other settings for tuning, Heimdall is able to make an intelligent decision on which node is “safe” to read from–either the read nodes, or only the write node.
How to configure Read/Write splitting on Heimdall Central Console:
- Ensure that a read/write master is configured along with at least one read-only server:
Figure 2. Read/Write Splits Configuration in “Data Sources” tab
- Configure the settings (if needed) to enable the azureDbHost setting to “true”, and adjust connection retry and timeout settings, as well as SSL configuration as needed:
Figure 3. “Data Sources” tab
- Configure Caching in the VDB tab as per your Azure Redis configuration:
Figure 4. “Virtual DB Cache Configuration”
In Figure 2, since the replication lag was set to be 1000ms, and a table was last written two seconds (2000ms) ago would be deemed safe to use the read replica. If however, the last write was 500ms, then Heimdall would route the query to the write master node to complete the read operation, guaranteeing that a stale response was not received. The value of the detected replication lag is found on the status screen (if any):
Figure 5. “Status” tab
For further protection, Heimdall adds a fixed lag window value, which is configured in a static manner. This allows the replication lag window to spike on the short-term without impacting the freshness of the data being returned. Further control over the read/write split is applied at a rule level based on matching regular expressions, including bypassing the replication lag logic for particular queries, and always using the read server.
4. Configure a Reader Eligible rule to specify what is allowed to be routed to the read server:
Figure 6. “Rules” tab
In cases where the replication lag is not a concern, the user can create a read/write split rule for particular queries that should unconditionally be read from the read/write server. This may be as a result of a particular user connecting to the database as well, i.e. to prevent a reporting job from putting load on the write master.
5. Real-time analytic charts on the Analytics tab:
Figure 7. “Dashboard” tab
6. Specific query source is found in the Log tab:
Figure 8. “Log” tab
SQL caching: The second-way Heimdall Data is used for improving database scale is to offload SQL traffic by caching. In this scenario, Azure for Redis acts as a look-aside, SQL results cache. Heimdall intelligently determines which queries to cache and automatically invalidates when the proxy detects an update to the database. Unique to other caching solutions, Heimdall caches SQL results in the application tier removing unwanted network latency between the application and database.
Figure 9: Heimdall Distributed Auto-caching Architecture
The best part of this joint caching solution is its transparency, not requiring any code changes. Caching and invalidation are automated. For more information on how to configure Heimdall for auto-caching, check our automated caching blog.
The Heimdall proxy provides the SQL routing logic, and Azure Redis provides the storage and communication bus to ensure your backend is scalable. By improving scale through read/write spits and SQL caching, queries will be traffic routed for maximum performance. Further, as Heimdall provides a bridge between a normal username and the hostname variation expected by Azure managed databases, it allows your application to run unmodified with Azure databases, and gain the benefit of read/write split.
Resources and links: