Why Clustrix: SQL Scales

ClustrixDB proves that SQL can scale out in production to massive deployment sizes.

ClustrixDB can scale reads, writes, updates and analytics, — near linearly — as you add nodes. The scale-out architecture of the cloud means that the new cloud applications use ClustrixDB to seamlessly scale their online businesses as new customers are added and transaction volume grows.

No downtime, faster response, and more revenue through the intense holiday season

nomorerack

ClustrixDB’s high availability and ease in scalability helps nomorerack sail through a 600% Revenue Spike by adding 8-nodes (64 cores) to existing 6-node cluster (48 cores)

“ClustrixDB requires no server management, the same as RDS, but we get much better enterprise-level support that is better and faster than RDS.”

- Keith Bussey, VP of Technology, NoMoreRack Logo for nomorerack

Industry Trend: Resurgence of SQL

Hive for Hadoop, CQL for Cassandra. Google runs Adwords on F1 distributed SQL database.

Google Logo with ShadowGoogle started the NoSQL trend with BigTable and has now reverted to distributed SQL databases for its own AdWords system. “We also have a lot of experience with eventual consistency systems at Google,” they write in the Google F1 paper. “In all such systems, we find developers spend a significant fraction of their time building extremely complex and error-prone mechanisms to cope with eventual consistency and handle data that may be out of date.”

So severe a penalty do these systems impose on developers that Google called it an “unacceptable burden.”  Google is encouraging developers to switch to SQL “for low-latency OLTP queries, large OLAP queries, and everything in between.”

Why ClustrixDB Scales

1Shared-Nothing Architecture

ClustrixDB was designed from the ground up to solve the challenge of scale in the cloud. ClustrixDB technology choices that allow it to scale linearly to hundreds of cores are its shared-nothing architecture, intelligent data distribution, and distributed query processing.

ClustrixDB is designed with a shared-nothing architecture, the only architecture proved to scale near linearly as you add nodes. The key characteristic of a shared-nothing architecture is that every node owns part of the data, evenly dividing responsibility for reading and writing to the data and reducing contention.

Every node has a query compiler and can accept queries. The database engine processes local data within each node. The data map is replicated on all nodes and knows where each primary and secondary key lives.

Clustrix Intelligent Data Distribution Diagram

2Intelligent Data Distribution

ClustrixDB distributes the data across the cluster–the tables are sliced and hash distributed across the cluster, with multiple copies of every slice. The Rebalancer ensures that the data and the workload are distributed evenly across the cluster. Given any primary or secondary key with ClustrixDB slicing every node knows which node owns the data.

The patented slicing done by ClustrixDB is superior to sharding and sharding-like approaches followed by other databases. Sharding distributes based on primary key only, and secondary indexes are co-located with the primary row. When you do a lookup by secondary key, the database doesn’t know which node to go to, causing broadcasts. ClustrixDB has independent distribution for primary and secondary indexes, removing broadcasts and making the cluster scale linearly.

3Distributed Query Processing

ClustrixDB moves code to where the data is rather than pulling data to the query node. This approach minimizes data movement across the cluster. As the number of queries grows, data motion across the cluster is minimized, allowing ClustrixDB to scale linearly. This strategy also ensures that only one node is trying to write to any piece of data, reducing contention.

Other primary databases pull data to the query node, causing data motion across the cluster on every query. Also, multiple nodes can try to write to the same data at the same time, causing contention. These databases do not scale linearly.

Clustrix Distributed Query Processing Diagram