Scale-out Architecture

ClustrixDB's Scale-out Architecture Delivers Cloud Scale

ClustrixDB proves that relational databases can support cloud-scale workloads, maintain low latency at high concurrency, and stay available during hardware failures. Clustrix rethinks how relational databases can deliver the cloud-scale deployment experience and support the massive transaction volumes that modern applications need. Built from the ground up, its scale-out architecture operates on any cloud, and out-performs its predecessors.

ClustrixDB is not your grandfather’s RDBMS.

How ClustrixDB's Scale-out Architecture Works

Designed to support high-value, high-transaction workloads with low latency, ClustrixDB is different from your standard relational databases in that it uses a shared-nothing architecture (memory and disk), automatic data distribution across all the servers, and automatic parallelization of queries to scale out SQL. Optimized for performance, The ClustrixDB scale-out architecture shifts queries to where the data is in the cluster rather than moving the data around, all of which allows near-linear scale as your database cluster grows.

Shared-Nothing Architecture

ClustrixDB is a scale-out relational/SQL database designed with a shared-nothing architecture, the only architecture proven to scale near linearly as you add nodes. Popular cloud scale data stores, like Hadoop and Redis, can scale out because they leverage a shared-nothing architecture as well. However, these data stores are not databases, and fall short in handling the structured data and cross-node ACID transactions needed in high-value transactions. Traditional relational databases such as MySQL and Aurora leverage single-master, and/or shared-disk architectures, which keep them from scaling writes.

The key characteristic of a shared-nothing architecture is that every node owns part of the data, evenly dividing responsibility for reading and writing to the data, and reducing contention.

The ClustrixDB scale-out architecture also employs independent index distribution, on both primary and secondary keys, using consistent hashing. This allows each node to know exactly on which node the required data resides, without recourse to any kind of "leader" node. All data is only a single hop from the querying node, significantly reducing broadcasts.

In addition, every node has a query compiler and can accept queries. The database engine processes local data within each node. The data map is replicated on all nodes and knows where each primary and secondary key lives. A scale-out architecture that works.

Intelligent Data Distribution with a Scale-out Architecture

ClustrixDB automatically distributes the data across the cluster – all tables are sliced using consistent hashing, and distributed across all the cluster nodes. Slices are similar to shards, but finer grained and managed completely by the patented ClustrixDB Rebalancer. There are a minimum of one mirror (or "replica") of every slice, allowing for high availability. The ClustrixDB Rebalancer ensures that the data and the workload are distributed evenly across the cluster, operating in the background and avoiding query performance degradation.

The slicing done by ClustrixDB is superior to sharding and other sharding-like approaches used by other databases. Tables are sliced both horizontally and vertically – hash-distributed ranges of rows are distributed across the nodes, as well as different "representations" of tables (e.g., by primary key, by secondary key, by coverage indexes). This fine-grained distribution both maximizes the parallelism of the queries, as well as minimizes hotspots at the storage layer. It is important to remember with sharding that each table partition is on its own RDBMS, requiring the application to maintain ACID guarantees for cross-node transactions. With ClustrixDB, the application sees a single logical database, no matter the number of database nodes. Cross-node transactions are always ACID compliant, and referential integrity is always maintained automatically.

Distributed Query Processing for a Scale-out Architecture

ClustrixDB brings the query to the data, not the other way around, in what is called distributed query processing. Distributed query processing minimizes data movement across the cluster.

The SQL language wasn’t designed to be multi-threaded. For relational databases using a single write master, this isn’t an issue. With ClustrixDB, since each node in the cluster can accept both writes and reads, we needed to parallelize SQL by breaking the queries into component functions, compiling them, and then distributing them across the cluster. Hence, distributed query processing.

ClustrixDB distributes these compiled query fragments to the server which contains your data, does the operations where the data already is, and then returns the result to you. For example, if you make a request, it lands on "Server A," but if server A doesn’t have your data, we dispatch that request to "Server B" or "Server C," which has the data, and only return the specific data you want back to "Server A." This allows us to parallelize work across multiple systems and really scale out the database.

With ClustrixDB distributed query processing, as the number of queries grows, data motion across the cluster is actively minimized, and database operations are evenly distributed automatically, allowing ClustrixDB to scale linearly. This strategy also ensures that only one node is trying to write to any piece of data, thus reducing contention, and allowing Clustrix to support massive concurrency while maintaining referential integrity and ACID guarantees across every transaction.

How ClustrixDB RDBMS Scales Writes & Reads

Scaling out a SQL RDBMS while maintaining ACID guarantees in realtime is a very large challenge. Most scaling DBMS solutions relinquish one or many realtime transactionality requirements. ClustrixDB achieves near-linear scaling of both write and read queries with full ACID compliance, by a combination of ‘bringing the query to the data’ and automatically distributing the data across all nodes in the cluster. Read this white paper for more information including how ClustrixDB leverages a Cascades Planner, MVCC, 2 Phase Locking, and the Paxos consensus protocol.

A New Approach to Scale-Out RDBMS

ClustrixDB is a scale-out SQL database that is designed to scale horizontally—by adding cores and servers. The shared-nothing architecture provides an entirely new approach to query resolution by moving the query to the data—not the data to the query. Learn how this revolutionary technology makes it possible to scale a single database across nodes, and still support massive concurrency while delivering high performance, full relational functionality, transactional consistency (ACID), and seamless deployment.

Why Traditional SQL Databases Fail to Scale Writes & Reads Effectively

Traditional OLTP database scaling is a major polarizing issue with both database administrators (DBAs) and application developers. Many people refer to relational databases as being a SQL database and say that SQL databases fundamentally cannot scale. They say it’s just not possible because SQL databases were not designed to truly scale, especially writes, and definitely not to cloud scale. Others will assert that those who believe SQL databases cannot scale lack the knowledge, experience, and expertise to actually scale SQL databases. Some DBAs say that’s why there are now NoSQL databases. 

Ready to get started?