Back in 2005 I was at Isilon. We had developed the first (and still the best) truly distributed scale-out NAS solution. We were on the road to an IPO and our customers were excited about our products and what we had done with them. I was on the phone with Jake, who worked for one of our largest customers. Jake said this to me: “This is great for what you guys have done of our basic file storage, but can you do anything about our databases?” BAM! Just like that I got super excited about this idea and started looking around to see who else had these types of scalability and fault tolerance issues with their databases. As you probably know, it turns out that it was an issue for just about everyone.
I got together with Sergei Tsarev (Clustrix co-founder) and we started working on the problem. What is it about these databases that isn’t scaling? Storage usage and query performance. Ok, so why isn’t it scaling? Because the query processor is only as big as the box can be, and scaling up with bigger and bigger boxes is a non-starter (forklift upgrades, leaving the commodity price curve, etc.). We saw some systems out there doing things with virtualized clustered storage engines running behind traditional query planners, but those always failed to scale because you wind up pulling all of this data, as well as dealing with locking and concurrency, over the network. We realized that the only way to bring true scalability is to fan out the work as you grow the cluster.
As Aaron (our CTO) says – ‘Bring the query to the data, not the data to the query’. In our whitepaper, A New Approach to Scale-out RDBMS, Clustrix provides a nice description of this concept as the seed for the revolutionary technology that is the heart of our clustered database systems.