ClustrixDB as a NoSQL Database?

We are really clear that ClustrixDB is a fully relational, fully SQL, fully ACID compliant database. It’s not a key-value store (KVS), not a document store (JSON and only JSON), and not “eventually-consistent” like Cassandra, DynamoDB, Couchbase, or Riak. So we don’t call it a NoSQL database.

Some people put ClustrixDB in the NewSQL category, but we’d prefer calling it BetterSQL. ClustrixDB is a SQL database…just better (and much more scalable).

So we are really clear about our identity. But customers and prospective customers keep asking us if they can replace their NoSQL database with ClustrixDB. In the past, we have always said, “ClustrixDB is a great replacement for your SQL databases, but maybe you should probably stick with a NoSQL database for your NoSQL needs”. But with our most recent product release, our Performance Architect and benchmarking guru, Peter Friedenbach, decided to run the NoSQL benchmark YCSB on ClustrixDB and show us that ClustrixDB is actually quite powerful at NoSQL workloads.

Peter recently did a round of testing that focused on write-heavy NoSQL workloads, and his results wowed all of us here at Clustrix Inc. We wanted to share his findings with the rest of you.

Below is Peter’s analysis of his testing, followed by a quick comparison to other NoSQL benchmarks.


Author: Peter Friedenbach, Oct 2017

A number of database products exist in the market today under the category of “NoSQL”. These “non-relational” databases claim to offer better scalability, at lower price points than traditional relational database products because:

  1. of their ability to “scale out” using distributed clusters of low-cost hardware
  2. they relax the ACID properties found in transactional databases in order to avoid potential bottlenecks in a distributed transaction manager

These products include systems such as DynamoDB, Cassandra, and Couchbase.

But wait a minute. While Clustrix is a relational database, it is also able to “scale out” performance using distributed clusters of low-cost hardware and it is also able to relax ACID properties in our transactional model. Clustrix isn’t just a relational database. It is architected with some of the same benefits of a NoSql database.

Can Clustrix also meet the performance needs of a “NoSQL” database?

To explore this question I will use a customized version of the YCSB benchmark (https://github.com/brianfrankcooper/YCSB/wiki) to measure how well Clustrix scales as a simple Key-Value store.

For this investigation, we’ll assume that we require performance in the range of approximately 600,000 operations per second. The fundamental question I will try to answer is whether Clustrix can meet this performance objective and how much hardware would be required to do so. In doing so, I will also try to demonstrate the scalability of Clustrix. The advantage of a scale-out architecture is that if you need more performance, just add more hardware.

Data Design

A key-value store (KVS) is a simple table structure consisting of a primary key and a column containing the “value”, stored as an object. The actual schema from the YCSB testbed is below.

CREATE TABLE usertable (
  YCSB_key char(25) CHARACTER SET utf8 not null,
  field0 mediumtext CHARACTER SET utf8,
  PRIMARY KEY (YCSB_key) 
)

A variable in the design is, how big is the “value” field in the key-value store (the average field length of field0 in the YCSB schema). I’ll examine the performance impact of 1K, 4K and 10K average field sizes.

In addition to the size of the “value field”, applications also vary on how often the data is read vs. written (Read:Write Ratio). In YCSB terminology, there are three standard read / write mixes defined which carry the labels “workloadc” (100% reads), “workloadb” (95% reads / 5% writes), and “workloada” (50% reads / 50% writes). We’ll look at all three variations.

Note: In the NoSQL world, an operation is usually a simple “put” or “get” of data from a key-value store. In our YCSB experiment we will use the corresponding SQL “select” and “update” syntax to perform similar single record operations.

The test bed uses a database with 10 million preloaded records and uses uniform distributions in choosing keys across the data store.

Hardware Environment

To perform this assessment I’ll be using what we call “yang” nodes in our data center. A yang node is a Supermicro blade server with 40 hyperthreaded cores, attached NVMe storage, dual 1G external and 10G internal networks, and 128 GB of memory. These nodes are approximately 4 months old and represent the current state-of-the-art for “mid-range” data center servers. The nodes are also comparable to similar hardware that we seen deployed at customer sites.

  • CPU:             Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (20 physical cores/40 hyperthreaded cores)
  • Data Disk:    SSD,1.2TB, NVMe/PCIe, 2.5in, INTEL DC P3520
  • Networking:  1G external network and 10G internal network
  • Memory:       128 GB Memory

Because of the high throughput rates and the large record sizes passed between the database and the clients, I am using 16 yang nodes to drive separate clusters of 4, 8, 12 and 16 yang nodes. The driver systems are communicating with the server systems using the 10G backend network. Normally the 10G network would be reserved for internode communication within the database cluster but initial experiments with this workload found that the front end 1G network was the bottleneck and needed to be avoided. Tip: If you want to run NoSql workloads, you need a lot of networking.

ClustrixDB Configuration

The tests were performed using ClustrixDB 9.0 which is the current version of ClustrixDB as of this writing. The software was configured in the same way we recommend customers configure their production deployments of ClustrixDB, with one exception noted below.

ClustrixDB has a legacy feature that is not really used in our production deployments these days. We don’t talk about it and don’t recommend it to customers, it defaults to OFF, but it can still be enabled through a dynamically settable global parameter. The feature allows ClustrixDB to acknowledge to the application that a transaction is fully committed (and therefore durable) before ClustrixDB actually receives the IO commit from the OS that the change has been persisted successfully to disk. The IO commit is queued up for an asynchronous write to the disk. What this means is that ClustrixDB told the application the commit was durable, but it may not actually be durable yet, because the IO is queue for service by the OS and hardware. We call this “relaxed durability” because the commit will be durable, but it may not be durable at the time ClustrixDB tells the application the commit is complete. When configured for relaxed durability, ClustrixDB is assuming the commit will succeed with the OS & hardware. ClustrixDB isn’t strictly following the implied contract of Durability in ACID, and this is why we ship ClustrixDB with this OFF (i.e. default is what we call “strict durability”), and why we normally don’t even mention it to customers unless they have a very special case.

Now, when we are considering using ClustrixDB as an alternative to NoSQL, we are in a special case. All of the mainstream NoSQL databases have tunable durability and tunable consistency. So in these tests, we “tuned” ClustrixDB just a little bit to have an asynchronous durability behavior to push it a little closer to how the NoSQL databases operate in production environments. This helps a bit in reducing the latency of write operations. It’s important to note, however, that ClustrixDB can’t relax consistency at all. ClustrixDB is always 100% consistent across all nodes. So all of the tests below are with a fully consistent ClustrixDB, and not an “eventually consistent” configuration.

Performance Results

1KB Average Record Size

Let’s start with 1KB average record size. This is a modest record size for a NoSQL environment, but a good starting point. The following performance curves illustrate the performance of various size clusters, running the three standard YCSB workloads.

I’ve trimmed the y-axis (throughput) to max out at 1.2 million operations per second. Our focus will be on the center of the axis (600,000). When a curve runs off the right-hand side of the graph it means the cluster, as configured, is capable of much higher performance levels. In fact, on the 16 node configure we saw throughput levels as high as 2 million operations per second.

What does this data tell us? Given an objective of 600,000 operations per second, a Clustrix cluster of 8 “yang class” nodes is more than sufficient to meet the performance needs of read-intensive workloads. As the workload demands move towards a more write-intensive workload, 16 nodes maybe are required.

 

4KB Average Record Size

Let’s continue now with a 4KB average record size. The following performance curves illustrate the performance of various size clusters, running the three standard YCSB workloads.

With this larger record size, ClustrixDB continues to perform well with read-intensive workloads. An 8 node yang class cluster is still sufficient to meet these needs. We’re starting to see, however, some limitations on write-intensive workloads. With 4KB record sizes and 100,000s of operations per second, we’re pushing a large amount of data through the I/O channel. That’s becoming a bottleneck. Even though we have relaxed durability configured, we’re still being impacted by disk performance. Maybe we need to move to more of an in-memory configuration? More of this later.

10KB Average Record Size

Let’s now move to a 10KB average record size. The following performance curves illustrate the performance of various size clusters, running the three standard YCSB workloads.

For read-intensive workloads, we’re seeing the need for additional hardware (12 nodes are now required for 100% reads and 16 nodes are required for 95% reads) and we are starting to see more impact on performance from the write operations. For write-intensive workloads, the effect of large amounts of I/O to the disk channel is evident in that all curves are well under our 600,000 focus point. But there is hope. The bottleneck with these numbers is the I/O system. These nodes are configured with a single NVMe/PCIe SSD drive. Adding more I/O capacity to the nodes would certainly improve this. Or we could bypass the I/O overhead altogether by moving to in-memory structures. We’ll look at this next.

10K Average Record Size with In-Memory tables

Besides relaxed durability, Clustrix also offers the capability of running entirely as an in-memory system (http://docs.clustrix.com/display/CLXDOC/In-Memory+Tables). This approach has a couple of benefits. First, as fully in-memory tables, the I/O workload to the disk channel is effectively eliminated. ClustrixDB’s in-memory tables do not persist the data to disk, however, the tables are redundantly protected like all other ClustrixDB tables using multiple replicas distributed across the nodes of the cluster. This means as long as the cluster is running, the tables are consistent and protected. Another characteristic of ClustrixDB’s in-memory tables is that the lock management features (i.e. transaction locking) is disabled on these tables. Your application can still perform transactions on these tables, and MVCC works just the same as with any other ClustrixDB table, only there are no row or table locks on these tables (similar to MyISAM tables). Removing the lock management overhead opens up even more performance on these tables.

The following performance curves illustrate the performance of various size clusters, running the three standard YCSB workloads, with in-memory data structures.

As would be expected the read performance remains about the same with the in-memory structures, but the real impact is on the write performance. With this configuration, a 16 node yang class cluster can meet the performance requirements of a 50:50 mixed workload with 10k record sizes.

Back to the initial question

Can ClustrixDB meet the performance needs of a “NoSQL” database? The answer is yes. And quite well actually.

ClustrixDB is built based on a scale-out, shared nothing architecture that provides the scalability that you expect to find with NoSQL databases. Add to this “relaxed durability” and “in-memory” processing and ClustrixDB can deliver scalability similar to other NoSQL databases.

The bottom line is that with ClustrixDB, the performance you will get is more of a function of the underlying hardware used (node types, I/O capacity, and network speeds), rather than the database itself. If you need more performance, just add more hardware.


Comparison to other NoSQL benchmarks

Now that we’ve seen what ClustrixDB can do with YCSB, let’s compare that to benchmarks you can find on the Internet for other NoSQL databases.

Let’s go back to ClustrixDB’s 1KB 50:50 read:write test results.

We can see that with 16 nodes, ClustrixDB is performing 600,000 operations per second at an average latency of 1.25 ms.

Now let’s look at a top Google Search hit link for “cassandra benchmark” https://academy.datastax.com/planet-cassandra/nosql-performance-benchmarks.

On this page, DataStax shows benchmark results comparing Cassandra to other NoSQL engines. Cassandra is known for its higher performance, so we can just focus on the Cassandra results. This is a very good article, and it’s worth spending the time reading it fully. We’ll extract just a few results to use as a quick comparison.

University of Toronto – Throughput for workload read/write

In these results, Cassandra is achieving around 225,000 ops/sec on 12 nodes.

End Point – YCSB Benchmark workload 50:50 mix

In these results, Cassandra is achieving just over 200,000 ops/sec on 16 nodes.

Compare those again to ClustrixDB achieving 600,000 ops/sec on 16 nodes. Not too shabby for a non-NoSQL database.

Naturally, every benchmark is different. And it’s really difficult to conduct an apples-to-apples comparison of different products, on different hardware platforms, using benchmarks executed by different engineers.

But to answer the question: Can ClustrixDB perform like a NoSQL database?

…it’s looking pretty hard to say no.