Why Clustrix: ClustrixDB Features

ClustrixDB is a distributed SQL database built for fast-growing e-commerce applications.

ClustrixDB lets you run massive transaction volume and fast real-time analytics at the same time. Designed for the cloud, ClustrixDB offers built-in high availability and is largely self-managing.

ClustrixDB Design

ClustrixDB is designed from ground up for the scale-out architecture of the cloud.

Designed to help your site grow fast, ClustrixDB takes the pain out of scaling by automating most of the operations

Scale-Out SQL

Clustrix Scale-Out SQL DiagramClustrixDB offers a scale-out SQL database that lets you simply add more nodes to your cluster as demand grows, so you can serve more users, transactions, and data. Clustrix keeps your application simple; it sees a single database that provides SQL with ACID guarantees.

ClustrixDB lets you handle growth simply, predictably, and at the low-cost increments of adding commodity hardware.

Clustrix patented technology distributes and redistributes data so you never have to shard or worry about data distribution. Also, you can send complex queries to any node, and unlike sharding, there are no limitations and no performance penalties for complex queries.

ClustrixDB has extensive support for SQL 92 features, including complex queries involving joins on a dozen or more tables, aggregates, sorts, and subqueries. It also supports stored procedures, triggers, foreign keys, partitioned and temporary tables, and fully online schema changes.

Massive Transaction Volume

Massive-Transaction-VolumeClustrixDB handles the massive transaction volume that large and fast-growing applications need, with ease. ClustrixDB scales near linearly as you add nodes, even with highly concurrent workloads.

ClustrixDB lets you handle the massive data and transaction needs of your application simply, without code changes and without replacing database or hardware as your application needs grow.

With ClustrixDB, every node can receive and process transactions. The database employs shared-nothing architecture, known to scale linearly with distributed fine-grained, row-level locking to minimize contention. The database also moves code to where the data is in the database cluster rather than moving data, allowing near-linear scale as cluster sizes grow.

Real-Time Analytics on Live Operational Data

Real-Time-Analytics DiagramClustrixDB allows you to run real-time analytics on your live operational data without moving it into another system. You can run ad hoc queries and reports on your most valuable data, current up to the second, while the database is ingesting high-volume data.

Real-time analytics let you get split-second response to complex queries on up-to-date customer data, without creating redundant databases.

ClustrixDB employs massively parallel processing (MPP) across its distributed cluster to parallelize and distribute SQL queries, and uses all available resources of the cluster to accelerate the queries. ClustrixDB employs multi-version concurrency control (MVCC) to ensure that reads and writes do not interfere with each other, allowing analytics to run in parallel with writes and updates without affecting performance.

In-memory analytics in ClustrixDB use memory backed by SSDs. The commonly used hot data stays in memory and the rest of the data is just a few microseconds away in SSDs. By using this combination effectively, Clustrix provides the right mix of durability, speed, and cost. In contrast, pure in-memory databases are expensive for TB-scale databases and lack durability for operational database needs.

Self-Managing Operation

Self-Managing Operations DiagramClustrixDB virtually eliminates DBA operations tasks because the management is built into the database itself. ClustrixDB is built with many points of instrumentation and ClustrixDB Rebalancer is always working in the background, keeping the cluster healthy with minimal overhead.

Our customers run deployments of hundreds of cores, and terabytes of data without a full-time operational DBA. ClustrixDB significantly reduces the work required for administration of the database, therefore significantly reducing the cost of ownership and allowing your engineers to focus on innovation.

With ClustrixDB, the data is automatically sliced and distributed across the cluster–the user does not need to pick shard keys. ClustrixDB Rebalancer is able to move data across the cluster while the data is being read and written. In case of imbalance, the Rebalancer will move data to restore balance. When new nodes are added, data is automatically moved to the new node. In case of node loss, some copies of the data are lost; to re-protect, lost copies are regenerated and moved to other nodes.

ClustrixDB is built with instrumentation that is always running, measuring multiple aspects of the database health constantly. In case the database cluster needs attention, it will send an e-mail to the administrator with the concern, such as when the database is reaching the capacity of storage or processing resources.

Easy Migration

Easy Migration DiagramClustrixDB allows you to continue to use your SQL code and SQL skills. ClustrixDB supports SQL 92 and MySQL extensions. For MySQL users, the code changes required are minimal. Clustrix can be set up as a slave to MySQL and promoted to master, while your database and application are running.

Migrating to ClustrixDB is significantly simpler than a move to NoSQL or sharded MySQL, which requires a lot more effort and precious engineer time upfront in the application, as well as ongoing operational costs thereafter.

ClustrixDB is built from scratch and does not include any MySQL code. However, ClustrixDB has a personality module that allows it to speak MySQL protocol with a few differences, mostly in corner cases as a result of its distributed nature. The list of differences is well documented and most customers are able to migrate with few or no code changes.

High-Availability Operations

FIVE 9s DiagramClustrix provides all the tools required to run business-critical production applications, helping our customers to get five-9s of availability.

Customers expect their services to be always available. Hardware, especially VMs in the cloud, fails often and entire geographical regions can become unavailable. The database, just like the application servers, needs to be able to recover from failures quickly and have disaster recovery features for events such as power outages.

ClustrixDB is built with simple and robust high availability. Within a cluster, we keep multiple consistent copies of each slice of your data. You get automated recovery in the face of disk or node failure, and your database stays available with no data loss. This capability is significantly more robust than local master-slave configurations, where slaves can lag.

You can deploy another cluster across geography with asynchronous replication. Clustrix uses the MySQL replication protocol. For disaster recovery Clustrix offers fast parallel backup that takes the same time irrespective of number of nodes in the database cluster.

Cloud DevOps Assist

Cloud Dev-Ops Assist DiagramClustrixDB is designed to help the developer or design DBAs understand the current health of the database and the workload and to optimize quickly. Constant query optimization changes query plans as data distribution changes.

Cloud computing features rapid iteration and deployment, and with it ever-increasing responsibility on the developer. ClustrixDB reduces the effort and time required to optimize your database.

With ClustrixDB Insight, from the browser you see the current health of the database, including the data and workload distribution across the cluster. The current workload window allows you to see the queries causing high system load at the current time. Historical workload comparison allows you to see the new queries introduced, helping to pinpoint any issues introduced in the last iteration.

ClustrixDB maintains rich statistics about the distribution of your data, including probability distributions on values. Query plans are cached for reuse, but costs are checked every time based on statistics. If the system believes your data distribution has changed enough, a fresh query plan based on the latest changes is automatically generated.