SQL Can Scale


ClustrixDB proves that SQL can scale out in production to accommodate massive workloads and high concurrency while still maintaining low latency.

ClustrixDB easily handles the massive transaction volume that large and fast-growing applications need, scaling near linearly as you add nodes, even with highly concurrent workloads. The scale-out architecture of the cloud means that cloud applications can use ClustrixDB to seamlessly scale their online businesses as new customers are added and transaction volume grows.

Designed to support high-value, high-transaction workloads with low-latency, ClustrixDB uses a shared-nothing architecture, known to scale linearly with distributed, fine-grained, row-level locking to minimize contention. With ClustrixDB, every node can receive and process transactions. The database also moves queries to where the data is in the cluster rather than moving the data around, allowing near-linear scale as cluster sizes grow.

Learn more about Why Traditional SQL Databases Fail to Scale Effectively, and how ClustrixDB has solved this problem.

Why ClustrixDB Scales

ClustrixDB was designed from the ground up to solve the challenge of scale in the cloud. ClustrixDB scales linearly to hundreds of cores because of its shared-nothing architecture, intelligent data distribution, and distributed query processing capabilities.

Shared-Nothing Architecture

ClustrixDB is designed with a shared-nothing architecture, the only architecture proven to scale near linearly as you add nodes. The key characteristic of a shared-nothing architecture is that every node owns part of the data, evenly dividing responsibility for reading and writing to the data, and reducing contention.

Every node has a query compiler and can accept queries. The database engine processes local data within each node. The data map is replicated on all nodes and knows where each primary and secondary key lives.

 

 

Intelligent Data Distribution

ClustrixDB automatically distributes the data across the cluster—all tables are sliced using consistent hashing, and distributed across all the cluster nodes. There is a minimum of one mirror of every slice, allowing for high availability. Clustrix’s patented Rebalancer ensures that the data and the workload are distributed evenly across the cluster, operating in the background and avoiding query performance degradation.

The patented slicing done by ClustrixDB is superior to sharding and other sharding-like approaches used by other databases. Sharding distributes based on primary key only, and secondary indexes are co-located with the primary row. When you do a lookup by secondary key, the database doesn’t know which node to go to, making broadcasts necessary. ClustrixDB on the other hand, has independent distribution for primary and secondary indexes, removing the need for broadcasts and making the cluster.

 

 

Distributed Query Processing

ClustrixDB moves code to where the data lies instead of pulling data to the query node. This approach minimizes data movement across the cluster. As the number of queries grows, data motion across the cluster is minimized, allowing ClustrixDB to scale linearly. This strategy also ensures that only one node is trying to write to any piece of data, thus reducing contention.

Other databases pull data to the query node, causing data motion across the cluster with every query. Also, multiple nodes can try to write to the same data at the same time, causing contention. For these reasons, other databases cannot scale linearly.