High-Value, Heavy Workloads: Where the Worlds of Big Data and Accurate Data Collide

In the world of big data, people understandably tend to get caught up in the “Big” part of it. NoSQL and Hadoop offerings, which often take center stage in Big Data dialogues, are good at handling massive scale, as long as you’re OK with a little bit of ‘eventual consistency’ and not being able to make sense of the data quickly and easily. And, in many Big Data applications, it’s alright to have a little ‘eventual consistency’–if you’re evaluating behavioral trends, weather patterns, etc, and you can look at enough data, small discrepancies are forgivable. Yet, there is a whole world of applications out there for which even the slightest bit of inaccuracy is a death knell. These run what we call “high-value” workloads which, put simply, are workloads involving multi-step procedures where every step needs to be performed accurately or money, inventory, or other assets may be lost. This is the world of e-commerce, online gaming, ad tech, and other industries, and in this universe there simply is no room for inaccuracy. So what happens when the worlds of high-value data and Big Data collide? Companies must turn to a new approach–one that combines aspects of both.

Companies in the “high-value” world have continued to employ MySQL or MySQL-derived databases (like Aurora) that will provide what is referred to as ACID compliance–the properties that ensure that database transactions are processed reliably. Yet NoSQL and MySQL fail to address a large segment of what is actually going on in business now, where companies need to achieve both massive scale and retain complete data accuracy. MySQL databases impose limitations on scalability, mainly because they are deployed on a single node or server; and once you run out of capacity, you’ve got to move to a bigger server, or employ unnatural feats to scale (and we’ll discuss the problems with that shortly). And unfortunately, a cloud deployment doesn’t help, because these databases don’t scale in the cloud the way other cloud applications do–because they’re stuck on a single instance.

Sacrificing Scale in the Name of Reliability

Gaming, ad tech and e-commerce are three examples of industries that have fully entered the world of heavy, high value workloads. For these industries, the need to process massive numbers of transactions and also retain complete data accuracy are of equal importance–and with traditional database technology, somewhat at odds with each other. So companies find themselves in a bind. They need to scale, but NoSQL solutions aren’t an option, because they don’t ensure data accuracy without writing lots of code and adding unwelcome complexity to the application–it’s got to be relational (SQL).

Yet companies hit a ceiling with MySQL and Aurora, and wind up needing to push them beyond what they were built for by performing unnatural feats such as sharding and read slaves, which come at a price. These tactics may not inherently mean loss of ACID compliance, but they add complexity and fragility to your applications. This adds expense, to be sure. But on top of that, any time your solution requires more staffing, more babysitting, it also raises the chances of something going wrong and valuable data being lost, damaged, or inaccessible.

You Don’t Need to Compromise

Do you always have to choose between scalability and accuracy? Fortunately, there’s a way out of this conundrum. ClustrixDB is specifically designed to meet the needs of customers with large, high-value transactional workloads. Most relational databases–including Aurora and MySQL–are designed to scale-up–that is, increase performance by migrating the application to a more powerful server. Once you run out of faster hardware, you have to resort to read slaves or sharding, and acquiesce to the problems of complexity and fragility that come along with these tactics–otherwise you hit a performance ceiling, and start to experience problems. ClustrixDB, on the other hand, is the only relational database designed to scale-out–increase performance by simply adding commodity server nodes.

ClustrixDB maintains the accuracy of MySQL with ACID compliance, yet scales the way other cloud applications scale, and this is why it can scale 10x and beyond Aurora. As the worlds of Big Data and high-value data continue to collide, companies no longer have to compromise by going with one or the other. It handles the “big” and also the “high value.”


Learn more about scale-out database ClustrixDB.