Everything You Ever Wanted to Know About High-Value/High-Transaction OLTP Workloads, but Were Afraid to Ask


… Amazon Prime Day and Pokemon Go teams should have asked us

The term “high-value, high-transaction workload” is admittedly not universal, yet it describes a very important class of DBMS transactions that sit somewhere between “Big Data” and critical data. Simply put, they entail large numbers of concurrent transactions at high speed, where even the slightest bit of inaccuracy or latency is problematic. These workloads involve multi-step procedures where every step needs to be performed accurately and at scale or money, inventory, or other assets may be lost. Inaccuracy is not an option.

Gaming and e-commerce, which must address large, sometimes unpredictable numbers of “add to cart” transactions and new account signups, are examples of industries fully immersed in the world of heavy, high-value workloads. In industries like these, the need to process massive numbers of transactions per second and also retain complete data accuracy are of equal importance–neither can be compromised to support the other.

So companies find themselves in a bind. They need to scale, and even super-scale for peak periods or new game launches, but NoSQL solutions aren’t an option, because they don’t ensure data accuracy without tinkering with the application in undesirable ways. Yet relational databases such as MySQL weren’t designed to scale transaction-oriented traffic: thousands of shoppers putting retail items in and out of a cart, or hundreds of thousands of gamers simultaneously playing a multiplayer game. And, both of these industries face somewhat erratic demands, making it difficult to predict in advance how much database power they’re going to need during temporary spikes in traffic.

Cloud computing has changed the rules of the database game forever

We’ve seen a huge rise in high-value, high volume transactions largely due to the advent of cloud computing, which enables companies to essentially “rent” computing resources. Both gaming and e-commerce applications are usually built for the cloud, meaning that they can scale out according to demand.

This would allow them to scale indefinitely to performance demands, if it weren’t for the fact that the same cannot be said of the databases that power these applications, which typically run on MySQL and its derivatives. So while you may be able to scale your gaming application at will, unless you have a database that is specifically built to scale out like other cloud applications–by adding server nodes–your performance will be hindered, and you may be subject to slowdowns, crashes and other nasty customer-repelling experiences.

Latency: “customer repellant” for the age of the impatient consumer

Research by Microsoft, Pew, and a other sources have confirmed the conventional wisdom that tech gadgets are having a “shortening” effect on our attention spans. This phenomenon is becoming more pronounced as shoppers continue to embrace mobile commerce. Mike Azevedo recently told BizReport.com: “We’re seeing a decrease in already-short attention spans when it comes to dealing with longer load times and other speedbumps during the checkout process. A huge factor in the rise in the popularity of mCommerce is the convenience factor, but this means that a majority of the shoppers using their phones to make purchases are not sitting on a cozy couch in their home, but more likely on the go so they actually have less patience to put up with a slow site on their phone. Additionally, outside factors affect the personality mood and overall experience of the mCommerce shopper as they are likely to be multi-tasking, commuting, or dealing with a poor wi-fi connection while also shopping.”

Whether online or mobile, the tough reality is that the web has made it ridiculously easy to switch to a competitor–if your site’s database experiences latency as you scale, it won’t deliver performance. And your customers will easily be able to go somewhere else and get a comparable product.

Let the past remain in the “single-node” past

Transactions in the pre-cloud era can probably be summed up in two ways: less scale, and much, much less concurrency. Consequently, older technologies handled it in ways that, quite frankly, just aren’t practical today. MySQL databases–the old standby for ACID compliant transactions–impose limitations on scalability, mainly because they are deployed on a single node or server; and once you run out of capacity, you’ve got to move to a bigger server, or employ unnatural feats to scale. And unfortunately, migrating to the cloud doesn’t help, because these databases don’t scale in the cloud the way other cloud applications do–they’re stuck on a single instance, just like on-premise applications. This is true even for Aurora, which has been touted as a more scalable SQL solution.

Companies hit a ceiling with MySQL and Aurora, and wind up needing to push them beyond what they were built for by performing unnatural feats such as sharding and read slaves. These common tactics used to increase database performance come at a price and are no longer feasible today, for a number of reasons. To start with, they are technically complex—tedious, largely manual, time-consuming processes that can leave organizations extremely vulnerable to outages or data loss. They’re also going to require changes to your applications, and more babysitting of your infrastructure, which adds cost to your operations. Additionally, while these tactics may not inherently mean loss of ACID compliance, they add fragility to your applications and raise the chances of something going wrong.

While these are commonly relied on, they really ought to be considered tactical stopgap fixes, not a strategic solution. You might be able to get away with using these tactics–if you’ve got astronomical resources–otherwise they’ll incur a level of complexity and fragility that will be very tough to sustain for the long-term. And quite frankly even the big players may not have the ability to handle this, as last week’s incidents with Amazon Prime Day’s add to cart fails and Pokemon Go’s slow load times and server outages illustrate. They both experienced nightmares from all angles. Customers sat waiting and frustrated, bad PR and company images, lost revenue and more were all caused by these launch day disasters. Their development teams undoubtedly were stressed and had to work tirelessly to put a patch on the problems. While they may seem fixed currently, who’s to say they won’t unravel again? Is there another solution for these high-value, high transaction applications that can solve the problem?