ClustrixDB: Bringing the Query to the Data—Not the Other Way Around

Some arguments can’t be settled on a fact. Case in point: Which came first, the chicken or the egg? In the world of databases, a similar conundrum pops up when it comes to query resolution: Should the query be moved to the data or should the data be moved to the query? At Clustrix, we believe that the query should be moved to the data—not the other way around.

Picture your data as being stored as sheets of paper stacked on top of one another in multiple rooms of an apartment building. These stacks are so high that when you open the door to any of the rooms, an avalanche of paper knocks you off your feet.

Imagine you’re looking for a specific piece of paper that’s hidden among the many stacks. Does it make more sense to carry all of the stacks from a room downstairs to the lobby and sort through them in order to find the one you’re looking for? Or would it be easier to walk to the room with directions that tell you exactly where the piece of paper is located and go up and grab it?

It doesn’t make much sense to do more work than is necessary to complete a task thoroughly. If you were to carry all of the stacks of paper downstairs to the lobby, you’ll definitely put strain on your body—heck, you might even get injured—and the whole process is bound to take a significant amount of time. But if you were able to figure out exactly where the paper you’re looking for was, you could run into the room, quickly locate it within a stack and grab it without breaking a sweat.

In traditional databases, the data is brought to the query. In other words, the many is brought to the one. We don’t think it makes much sense to put all that strain on your database when it’s not necessary. And that’s why we’ve designed the first truly scalable and fault-tolerant massively parallel clustered database system: ClustrixDB.

With ClustrixDB, the query is brought to the data, or the one is brought to the many, causing significantly less strain on your database. To be a bit more ‘tech-y’ about it: ClustrixDB distributes compiled query fragments to the server which contains your data, does the operations where the data already is, and then returns the result to you. For example, if you make a request, it lands on ‘server A’, but if server A doesn’t have your data, we dispatch that request to ‘server B’ or ‘server C’ which has the data, and only return the specific data you want, back to ‘server A’. This allows us to parallelize work across multiple systems and really scale-out the database.

As an overview- leveraging the power of NewSQL, the cloud database has the power to run many queries concurrently across the cluster. It does this while ensuring the integrity of the relational database while maintaining atomicity, consistency, isolation and durability (ACID) in every transaction.

Our database allows you to enjoy complete flexibility when it comes to data layout. This versatility allows you to make schema updates while your database is live, while giving you the ability to selectively increase the amount of replicas on specific tables, availability and read performance in a moment’s notice. And all of this can be done quickly and efficiently, without interrupting the user.

ClustrixDB’s shared-nothing architecture is a whole new take on query resolution. Our innovative cloud solution promises high performance while also maintaining the integrity of the relational database.

But we’re just skimming the surface of the technology behind ClustrixDB. Check out a deep dive into what makes our NewSQL cloud database tick.