Before you get any ideas, we’re talking about good old-fashioned database ACID compliance. ACID is not a new concept to developers. Certainly every DBA knows it, but collectively, the software industry lost its addiction to ACID some time ago.
This is because the industry found a new thing to crave: distributed computing. Internet giants like Google and Facebook created massive businesses that eclipsed the scale of anything traditional enterprises had done previously. They were smart and focused on a user experience that included performance. To reach a quarter of the planet it, was a given there would never be a single machine large enough to match the needed scale. Plus, their apps had already proven out a shared-nothing architecture. Rather than waste time trying to hack MySQL into bending to their will, they set out to find a different way to put their data in a distributed, shared-nothing architecture.
It turns out that having atomic transactions consistently committed and readable across distributed nodes AND always available was hard. These trade-offs are the crux of the well recognized conundrum that CAP Theorem summarizes. So when looking to build Apache Hadoop, the solutions to force ACID into its shared-nothing architecture were so out of reach that they were forced to truly gut check themselves, asking, “Does ACID really matter?”
Getting Off ACID
The truth is, in many cases, ACID doesn’t matter all that much. More specifically, people found that you can architect around it, and developers in cloud-scale apps spent the next ten years getting really creative. Companies like Instagram created a series of inboxes, where when you post a photo its reference gets dealt out to every one of your followers. For most people, followers won’t know if there is a delay of a few seconds. In the case where that message relay could provide a massive delay, say for Justin Bieber beliebers, they route that user’s posts to a special handling queue that gives his millions of followers dedicated resources so they get messages faster. Even this isn’t instantaneous, but their Redis and RabbitMQ architecture is fast enough that no one really cares. Plus, it’s a picture. No money is changing hands.
In many cases, waiting for data to ‘catch up’ is fine. Hardware in this model of computing is relatively cheap and this architecture is built for speed. So if a delay is in the seconds or millisecond range, does it really matter? While ACID still had merits, most workloads were able to cheat the ideal for the past decade.
But ACID Does Matter
In transaction-based workloads, such as e-commerce or availability to promise, a commit has to be a commit, especially at high-volume. Many businesses have a built-in model to handle some margin of error, like airlines who routinely oversell a plane and assign seats later. For some companies though, there is no wiggle room. For instance, Ticketmaster cannot afford to sell the same ticket twice. When their system is attempting to sell out a stadium in minutes, the speed and scale of their entire system needs to work with ACID guarantees.
Those companies have been muscling through with strategies using replication, read slaves and/or sharding. MySQL and PostgreSQL and their variants have been working to make these unnatural architectures easier to achieve. At the heart though, they are not shared-nothing architectures, so solutions are a lot of work and still have limitations. At some level of scale, latency will cripple them and money will be lost. These solutions also do not scale down easily and they typically require a mountain of admin work to rebalance and refactor apps to keep them tuned.
As proof, Google commented in their Distributed SQL Database That Scales paper on how quickly this work can spiral to unacceptable limits:
“We also have a lot of experience with eventual consistency systems at Google. In all such systems, we find developers spend a significant fraction of their time building extremely complex and error-prone mechanisms to cope with eventual consistency and handle data that may be out of date. We think this is an unacceptable burden to place on developers and that consistency problems should be solved at the database level.”
Cloud-natives Are Bringing ACID Back
To be clear, ACID never went away. It was bypassed by complex workloads and modern architectures but in many cases it just meant the data never could move to the cloud. With the cloud movement being what it is, the industry is finally tackling that hurdle. This is why we have Amazon Aurora and Google Spanner. It is also why Clustrix built ClustrixDB.
Each vendor went about it their own way. Amazon seems to be short-cutting shared-nothing, so it is still write constrained as it grows. Google is not. It still has latency challenges as you add nodes, so scale still comes with a penalty for ACID transactions but it’s Google, so we expect they will catch up. Neither will go the multi-cloud with the on-premise option though. That’s where ClustrixDB fits in. Clustrix has been building their database for longer, writing a database engine from scratch with a shared-nothing architecture going on 10 years now, with long-running production deployments to back it up. But, we aren’t baked into the cloud services yet, so it’s harder to find us.
We all have our merits and areas to grow. One thing is for certain: vendors are finally serious about making ACID reign in the cloud again, and that is very good for businesses and their apps.