In my previous post, I documented how we were benchmarking Jira as a representative application with a fair amount of database activity.  In the testing, we injected latency into the server, starting with no additional latency, and growing from there.  The intent of injecting latency is to demonstrate that even highly optimized applications can be impacted by even minor database latency.

The reason why the latency between the application and the database is increasingly important, is with the move to cloud architectures, the ability to tune and control the network infrastructure for low latency is being reduced.  In Amazon AWS, it isn’t uncommon to have 1-2ms of latency between availability zones in the same region, and significantly higher if spread across regions (AWS Availability Zone Design).  This is on top of latency between nodes in the same availability zone, which can be up to 1ms+ alone.  This assumes everything is working properly.  Additionally, there has been a push by various vendors to provide database as a service for applications (DBaaS), most notably Oracle.  They have even started pushing to migrate to the Oracle cloud under threat of auditing the Oracle licenses (Business Insider on Oracle Threats).  The net result of this is a push to use a database in the cloud, even if the application is still hosted in a customer’s own data center (a Hybrid deployment).

As such, we decided to perform our testing to demonstrate how these factors impact performance vs. a more traditional model of self-hosting an application and database together, with a highly optimized network between the two, and tested with a starting latency of 0ms injected, then 125 microseconds, and adding 25% more latency on each step after that.  We capped at 18ms, as most local data centers are likely to be able to connect to a cloud hosted data center that is close enough so that this is achievable.  Here are the results, with and without Heimdall, and over various latency bands:


And the raw results:

No Cache (ms) Cache (ms) Delay (microseconds) Without Cache With Cache Savings
134.12 127.18 0.00 1.00x 0.95x 5%
139.83 129.17 0.20 1.04x 0.96x 8%
142.17 129.16 0.25 1.06x 0.96x 9%
144.75 130.86 0.31 1.08x 0.98x 10%
147.39 131.36 0.39 1.10x 0.98x 11%
148.62 134.81 0.49 1.11x 1.01x 9%
154.46 133.60 0.61 1.15x 1.00x 14%
158.96 138.81 0.76 1.19x 1.03x 13%
164.57 141.46 0.95 1.23x 1.05x 14%
173.17 145.58 1.19 1.29x 1.09x 16%
184.33 150.91 1.49 1.37x 1.13x 18%
197.56 156.76 1.86 1.47x 1.17x 21%
214.01 164.38 2.33 1.60x 1.23x 23%
239.56 173.30 2.91 1.79x 1.29x 28%
256.92 185.31 3.64 1.92x 1.38x 28%
283.70 201.95 4.55 2.12x 1.51x 29%
325.70 218.54 5.68 2.43x 1.63x 33%
374.92 248.68 7.11 2.80x 1.85x 34%
428.75 285.81 8.88 3.20x 2.13x 33%
498.97 311.09 11.10 3.72x 2.32x 38%
621.12 350.66 13.88 4.63x 2.61x 44%
693.66 396.71 17.35 5.17x 2.96x 43%

What is interesting to note in these results is that even with .95ms of latency, the difference in user response time grew by 23% without caching, while with caching, the response time was only 5% higher than the non-cached baseline.  As the injected latency increases, the savings over the non-cached baseline grows wider and wider.  For applications that are more heavily dependent on database performance, these numbers are likely to diverge even from the baseline, and spread even faster.

The net takeaway of this is that even if an application that is self-hosted today isn’t dramatically impacted by the network latency to the database, it is likely that by moving to the cloud, noticeable performance impacts are likely.  When accounting for Hybrid cloud or designs including disaster recovery, the impact will be greater, sometimes by a large amount.  Heimdall data allows client-side view of database performance INCLUDING the behavior of the network interconnects between, something that is often lost with pure database monitoring solutions.  At the same time, by caching on the client-side, the impact of added latency can be at least partially mitigated.

Other Links of Interest

In order to get a perspective on the latency between two different geographic regions, depending on carrier, you can use the following links:

AT&T Global Network Latency

Sprint Network Performance

Verizon Latency Statistics

Cogent Network Performance

To evaluate the performance from your current location to the various AWS regions, you can use Cloud Ping.