When I talk to database developers and database architects, they’re often consumed with how to deal with the challenges of scaling their databases in the face of rapid growth while simultaneously focusing on applications changes to help their businesses grow. They talk about hiring expensive performance consultants, ongoing tuning and optimization efforts that also need development and QA, trying to find the “right” monitoring tools, and the dreaded pager alerts and load spikes that can bring a site to its knees. In this post we'll cover the different scaling strategies and the differences between scale up and scale out.
So how do you build a scalable website? First, let’s start with a typical application architecture:
An application server communicates with the database and a separate file store.
Scaling your web server is a well-solved problem, and your favorite website probably uses a load balancing scheme to distribute load across a group of servers in order to always serve your request to read their pages.
Likewise, scaling your file store is also a well-solved problem, and products like EMC’s Isilon provide easy ways to scale a single storage volume as your needs grow.
The last bastion of scale then leaves us with the database.
As traffic to your website grows, your database experiences one of two fundamental pressures as it scales:
- Data size: your data set grows as you attract more users and store more data, leading to a larger volume of data
- Usage patterns: the amount of concurrency grows, questions change, usage increases
Eventually, as your database utilization increases, you’ll start to see signs of these pressures and your database will begin to slow down. But this is precisely when you want more concurrency and data. Queries start taking seconds rather than milliseconds, making your web pages intolerably slow.
One of the first things that companies try in order to increase database capacity is to “scale up,” a strategy that can be summarized as follows: buy a bigger box. While this works, you might pay four to five times what you paid for your previous solution, and you’ll only get twice the performance. And as your data set and application growth continues, you’ll run into the same issues as you reach your new hardware’s limits. The next step is to scale up again, except now your solutions might include more specialized hardware like SSDs and Fusion I/O cards. Again, these are costly (8x the price), and will not get you proportional capacity (4x).
Each time you invest more to scale up, you’re just buying time. There comes a point where there is no box that will fit your application’s load. But if you’re planning for capacity as your site grows, you’ll find that there is no box big enough to handle your future growth.
The conventional wisdom of how to scale your database past the single-instance has been database sharding, or partitioning your data across multiple servers. Your application must now be aware of where data resides so relevant queries are sent to the right shards. If your application doesn’t have a logical key to shard on, you won’t be able to distribute your traffic evenly. And, if you have any application logic that needs to operate across shards or maintain referential integrity, a sharded architecture won’t perform.
There is also an expensive overhead to your application development that will sandbag it. Sharding isn’t easy, and while many solutions are “free,” they come at a high cost – an expensive price to pay for a solution that relies on extensive data movement and a centralized computation model.
In recent years, there’s been a lot of buzz around NoSQL as a new way of handling your data. NoSQL is a key tool in your tech stack if, for example, your application is for an “internet of things,” where data is collected in large quantities, and written once, but perhaps never read. Or perhaps you are a data scientist, crunching best online casino over petabytes of data and running complex analytics on an offline data store. A relational database is not for all types of workloads. What if your application relies on transactional integrity for an OLTP workload, immediate consistency, and real-time analytics? Your application needs a SQL solution to scale.
Sharding and NoSQL are not solutions that provide true horizontal scale to your SQL database. Enter the Clustrix database, which distinguishes itself by embracing that:
SQL is relational, transactions rely on consistency, and data needs to be durable. This is why when Clustrix set out to tackle some of the hard problems of scaling databases, we made sure to stay ACID compliant.
Data is important, which is why durability matters. If a node fails, what happens to your data? If you’re using an in-memory solution, you might be afraid to know.
Linear scale is the true measure for whether a database is scale-out SQL database. Adding capacity should be as simple as adding a node and gaining proportional performance gain. You should be able to grow your business without having to worry about whether you’ll hit some limit in your database.
Clustrix provides horizontal scale by leveraging principles of distributed computing to provide a clustered database solution. With the Clustrix database, you’ll never have to write specialized code to scale your database or worry about how your database will handle your application’s growth.
The Clustrix database architecture starts with data that is distributed and fully redundant. A great example of how Clustrix leverages data distribution to provide better performance is described in our blog post by Clustrix CTO Sergei Tsarev on fully distributed joins.
In addition, Clustrix uses principles of MVCC (Multi-Version Concurrency Control) to avoid locking and scale writes, eliminate read contention, provide automatic data re-distribution when needed, and leverage parallelism to provide faster backup and faster MySQL replication, all of which is completely MySQL compatible. Clustrix is a truly scale-out SQL database that can take your application to the next level.
Learn more about how ClustrixDB can scale out.