In my previous post I wrote about shared disk architectures and the problems they introduce. It’s common to see comparisons between shared disk and shared nothing architectures, but that distinction is too coarse to capture the differences between various shared nothing approaches.
Instead, I’m going to characterize the various “shared-nothing” style systems by their query evaluation architectures. Most systems fall into one of the following buckets:
- Centralized compute
- Limited distributed compute
- Fully distributed compute
- Centralized Compute: MySQL Cluster
MySQL Cluster consists of two basic roles used for servicing user queries: a compute role and a storage/data role. The compute node is the front end which takes in the query, plans it, and executes it. The compute node will communicate with the storage nodes remotely to fetch any data relevant to the query.
In the distributed storage model, data is no longer shared between the nodes at the page level. Instead, the storage nodes expose a higher level API which allows the compute node to fetch row ranges based on the available access paths (i.e. indexes).
In such a system, storage level locks associated with the data are now managed exclusively by the storage node itself. A compute node does not cache any data; instead, it always asks the set of storage nodes responsible for the data. The system solved the cache coherence overhead problem.
However, it still suffers from extensive casino online data movement and centralized query evaluation.
- Cache coherence overhead
- Extensive data movement
- Centralized query evaluation
- MySQL Query Evaluation in Action
Consider the following example:
SELECT count(*) FROM mytable WHERE acol = 1 and bcol = 2
an index over acol
10% of the rows in the table match acol = 1
3% of the rows match acol = 1 and bcol = 2
total table size 1 Billion rows
In the diagram above, the arrows represent the flow of data through the system. As you can see, most of the query evaluation in the example is done by a single compute node. The system generated a data movement of 100 million rows, and only a single node performed additional filtering and aggregate count.
It’s an improvement over a shared disk system, but it still has some serious limitations. Such a system could be well suited for simple key access (i.e. query touches a few specific rows), but any more complexity will generally result in poor performance.
As with the shared disk system, adding more nodes will not help improve single query execution, and queries which operate over large volumes of data have the potential to saturate the message bus between the nodes.