Why Database Operations Hit the Wall

An operations team often goes from uninterrupted scaling to a complete meltdown. The problem is that the database your team is using only scales to a certain point. The company is rapidly growing and you plan to have a “company making” weekend. Then, something bad goes wrong. The MySQL server has reached capacity in terms of its ability to process requests, resulting in lost users, business and opportunity. This is a fundamental flaw in design of any database that can’t do clustering for scale-out. Instead of having your team embark on a giant software development project, you can install a ClustrixDB. Our database grows simply by adding nodes into the cluster, saving you hundreds of thousands, and sometimes millions of dollars, in development cost and unplanned downtime.

One thing at Clustrix we’ve seen over and over again is the mixed surprise, delight, and fear about a rapidly scaling online business. In some cases, seemingly over a weekend, an operations team goes from scaling just fine to complete meltdown. How can this be? How come your operations team was able to scale from 0 to 200,000 users, but couldn’t go from 200,000 users to 300,000 users?

Here is some analysis of why this happens. It comes down to the inability of existing solutions to scale incrementally with load. The problem is the database. The database your team was using to build the back-end system only scales to a certain point. What was working all through development, QA, and ini al deployment may suddenly fail at the worst possible time — as soon as your application starts to grow.

The following graphs show typical performance data, along with a projected project timeline. This mirrors the experience of many web application groups.

The Scenario

Here is a description of the trap that we’ve seen a lot of folks fall into. You’ve built your application, customers are starting to take notice, and you’re growing. Things are taking off:

 

Growth curve

Every day more and more people are taking notice. You want to really accelerate things, so you launch a big promo on and have your first huge weekend. This weekend could be “company making” for you guys, and it’s all hands on deck to make sure nothing bad happens. But something bad does happen — something terrible:

Losing customers at scale

This is based on test run data from a MySQL server on a production level server. What happened here? We climbed up a slope of performance and concurrency and suddenly everything cratered. A er the ini al peak, each a empt to add more load was met with less throughput, not more. The MySQL server has reached capacity in terms of its ability to process requests. It can do no more work than it’s doing now. This isn’t some fundamental flaw in MySQL. This is a fundamental flaw in design of any database that can’t do clustering for scale-out and the results are lost users, lost business, and a painful way to watch an opportunity go down the drain.

At this point the operations and dev team works up a plan. Maybe buying a bigger server can delay the problem, but that’s only a temporary solution. What are you going to do once you hit the limits of that box? The most common solution is something called “sharding”. When the team starts sharding, they break what used to be a single database into a bunch of individual databases. They write software to glue things back together again. This breaks most of the benefits that these databases were designed to deliver, and produces a very brittle and easy to fail application stack. Sharding is the development-heavy solution to the problem of not having a scalable database solution. People spend months writing very complex code, and then the maintenance of this code goes on and on … forever. The people who can do this kind of work are hard to find, very expensive, and specialize in these sorts of infrastructure problems that have workarounds which are extremely application specific. The solution for one application doesn’t work for any others because it’s a baked in part of the application architecture — a custom project, not a solution.

Enter ClustrixDB

With ClustrixDB you have a real, permanent, and simple solution. Instead of having your applications team embark on a giant software development project, you can install ClustrixDB. Our database grows simply by adding nodes into the cluster. Let’s zoom out on that earlier data plot and look at how ClustrixDB compares to that MySQL server:

Growing database with nodes

hat this shows is that adding nodes to a ClustrixDB cluster results in on the fly scale and performance. No matter what your operational scalability and performance needs, ClustrixDB can handle it.

This contrast with sharding is why our customers tell us that ClustrixDB has saved them hundreds of thousands, and sometimes millions, of dollars in development cost and unplanned downtime. Think about the time to market advantage of having the infrastructure challenges already solved. If you don’t, you’re competitors will. Take advantage of the proven scale-out capabilities of the ClustrixDB cluster.

Now, anybody anywhere in the world can build scalable applications without having to worry about the database.