Why Clustrix: SQL Scales

ClustrixDB proves that SQL can scale out in production to massive deployment sizes.

ClustrixDB can scale reads, writes, updates and analytics, — near linearly — as you add nodes. The scale-out architecture of the cloud means that the new cloud applications use ClustrixDB to seamlessly scale their online businesses as new customers are added and transaction volume grows. Seasonal businesses like e-commerce can use the Flex option to add capacity before the holiday season to handle the spike in traffic and transactions. And then you can decrease your capacity back down after the season is over.

Read more about Choxi’s use of Flex to handle holiday spikes.

Read More About Flex

No downtime, faster response, and more revenue through the intense holiday season

choxi logo

ClustrixDB’s high availability and ease in scalability helps Choxi sail through a 600% Revenue Spike by adding 8-nodes (64 cores) to existing 6-node cluster (48 cores)

Keith Bussey, VP of Technology NoMoreRack 

“ClustrixDB requires no server management, the same as RDS, but we get much better enterprise-level support that is better and faster than RDS.”

- Keith Bussey, VP of Technology, Choxi (formerly nomorerack.com) Logo for choxi

Industry Trend: Resurgence of SQL

Hive for Hadoop, CQL for Cassandra. Google runs Adwords on F1 distributed SQL database.

Google Logo with ShadowGoogle started the NoSQL trend with BigTable and has now reverted to distributed SQL databases for its own AdWords system. “We also have a lot of experience with eventual consistency systems at Google,” they write in the Google F1 paper. “In all such systems, we find developers spend a significant fraction of their time building extremely complex and error-prone mechanisms to cope with eventual consistency and handle data that may be out of date.”

So severe a penalty do these systems impose on developers that Google called it an “unacceptable burden.”  Google is encouraging developers to switch to SQL “for low-latency OLTP queries, large OLAP queries, and everything in between.”

Featured White Paper

Why Traditional SQL Databases Fail to Scale Effectively

Read the White Paper

Why ClustrixDB Scales

1Shared-Nothing Architecture

ClustrixDB was designed from the ground up to solve the challenge of scale in the cloud. ClustrixDB technology choices that allow it to scale linearly to hundreds of cores are its shared-nothing architecture, intelligent data distribution, and distributed query processing.

Clustrix DB Share-Nothing Architecture

ClustrixDB is designed with a shared-nothing architecture, the only architecture proved to scale near linearly as you add nodes. The key characteristic of a shared-nothing architecture is that every node owns part of the data, evenly dividing responsibility for reading and writing to the data and reducing contention.

Every node has a query compiler and can accept queries. The database engine processes local data within each node. The data map is replicated on all nodes and knows where each primary and secondary key lives.

Clustrix Intelligent Data Distribution Diagram

2Intelligent Data Distribution

ClustrixDB automatically distributes the data across the cluster–all tables are sliced using consistent hashing, and distributed across all the cluster nodes. There are a minimum of two copies of every slice, allowing high-availability. Clustrix’s patented Rebalancer ensures that the data and the workload are distributed evenly across the cluster, operating in the background and avoiding query performance degredation.

The patented slicing done by ClustrixDB is superior to sharding and sharding-like approaches followed by other databases. Sharding distributes based on primary key only, and secondary indexes are co-located with the primary row. When you do a lookup by secondary key, the database doesn’t know which node to go to, causing broadcasts. ClustrixDB has independent distribution for primary and secondary indexes, removing broadcasts and making the cluster scale linearly.

3Distributed Query Processing

ClustrixDB moves code to where the data is rather than pulling data to the query node. This approach minimizes data movement across the cluster. As the number of queries grows, data motion across the cluster is minimized, allowing ClustrixDB to scale linearly. This strategy also ensures that only one node is trying to write to any piece of data, reducing contention.

Other primary databases pull data to the query node, causing data motion across the cluster on every query. Also, multiple nodes can try to write to the same data at the same time, causing contention. These databases do not scale linearly.

Clustrix Distributed Query Processing Diagram