Database Metro Area Clustering Across Data Centers


Data integrity, high availability, and easy disaster recovery have always been a major requirement of our ClustrixDB customers. Their OLTP applications are mission-critical and often customer- or consumer-facing, which makes the acceptable margin for error virtually nil.

Over the years we have brought a variety of innovations to market to meet these requirements, such as our patented ClustrixDB nResiliency for fault tolerance and the Clustrix Rebalancer that automatically optimizes data distribution for the number of available nodes.

With the advent of networking between data centers within a metropolitan region, such as with AWS Availability Zones (AWS AZs), we saw the opportunity to take another leap in meeting the stringent application requirements of our high-profile customers.

ClustrixDB Metro Area Clustering and Availability Zones

ClustrixDB 9 supports Metro Area Clustering and Availability Zones as a new way to deploy our distributed database in public or private clouds across networked data centers in large metropolitan areas. AWS Availability Zones is one example, but other cloud vendors offer similar networking support and some private data centers are deploying them.

With ClustrixDB 9 you can define “Zones” within a cluster, where each zone contains a subset of the nodes in the cluster and each zone can reside in another data center within a large metropolitan area; for example, San Francisco, Palo Alto, and Oakland.

A Quick Review of Failure Domains

A Failure Domain is any logical group of resources that are likely to fail together (Wikipedia: Failure Domain). For most distributed databases, the cluster would consider a single server (aka node) as a failure domain. If a disk failed in the node, the database considers the entire node failed. If other nodes in the cluster can’t reach a node over the network, for any reason, the cluster considers that node failed. This means the failure domain is at the level of a server.

That’s great, but what if you have a bunch of servers on a server rack and that rack has an unreliable power distribution system causing the entire rack to lose power? And let’s assume this is a common problem that you have not been able to take care of though you know you should. In this case, the entire rack is a failure domain. If the rack’s power stops, all servers on that rack will stop.

Now in ClustrixDB 9, you can tell the cluster that a set of nodes should be considered as a single failure domain. So let’s say you have a 9-node cluster, and you have 3 nodes in each rack. And you still haven’t found time to fix that power distribution problem yet (we’re not judging you). You can now tell ClustrixDB 9 that the 9 nodes are grouped into 3 zones of 3 nodes each, corresponding to which rack they are in.

What this does is give ClustrixDB 9 the information it needs to avoid placing both (or all) replicas of a data slice in the same zone. In any deployment scenario, ClustrixDB makes sure it does not put both replicas of a data slice on the same node by default, but now that you have told it that multiple nodes are in the same zone (aka failure domain) it will know not to put both replicas in the same zone. That way if the zone fails, you haven’t lost all copies of that data slice, and the database will continue reading and writing all of the data.

Extend That Concept to Metro Area Clustering and Availability Zones

Now let’s take that 3 zone cluster of 9 nodes, and instead of running each zone in a rack in your data center we’ll put it into AWS. Yay! Now you don’t have to fix that power distribution system. We will create the cluster with 3 nodes in 1 AWS Availability Zone (AZ), 3 nodes in another AZ, and the remaining 3 nodes in a 3rd AZ. A key point is that all of these AZs are in the same AWS Region.

If you’re not 100% clear on the difference between an AWS region and an AZ, check out this
great blog article by our good partners at Rackspace: AWS 101: Regions and Availability Zones

The reason we need all nodes in AZs in the same AWS region is because ClustrixDB needs low-latency networking between the nodes in the cluster. We state a latency requirement of less than 2 ms to ensure good performance of your ClustrixDB cluster. AWS does a good job of keeping the network latency between same-region AZ around 1ms or less (most of the time). But between AWS regions, your network traffic travels across the public Internet, so there’s little anyone can do to ensure the latency.

If you read that great Rackspace blog article, you now know that AWS AZs within a region can be in totally different data centers within that region. So you can think of multiple AZs as actually multiple data centers (although technically even 1 AZ can be multiple data centers, but let’s not go there today).

Finally, let’s say you actually don’t want to run in AWS, but instead you want to run ClustrixDB across multiple data centers in your metropolitan area. Let’s say you have a data center in San Francisco, another in Palo Alto, and one in Oakland, and your inter-data-center network has latencies consistently less than 2 ms.

What you’ve just created is a Metro Area Cluster of Zones using ClustrixDB 9’s new Zones feature and your own data centers.

Why Metro Area Clustering and Availability Zones

With ClustrixDB, the important thing to remember is that this is a single database instance that is stretched across three closely networked data centers. It is not three (or even nine) databases that are replicated to each other.

This means that any application that writes to a node in San Francisco, can immediately see the effect of that write in Palo Alto. Regular RDBMS transaction semantics still hold true and are not changed in any way by this configuration (i.e. the app in SF needs to commit before any session in Palo Alto will be allowed to see that change).

That means you don’t need to setup or manage any replication between the data centers. And since there is no replication, there is no slave lag (when the replication slave is transactionally behind the master).

If you had used replication instead of ClustrixDB’s Zones, then you would have a master database in one zone, with two slave databases in other zones. Aside from the aforementioned slave lag which many applications can’t tolerate, your application could only make changes to the master database and must not make changes to the slaves. This means that any database session that needs to perform an insert, update, delete, or select for update, must run only on the master. Therefore your transactional workload is limited to only the servers in the master’s zone. But with ClustrixDB, all nodes in all zones are read/write and enforce 100% consistency with each other. So your app can read and write from any node in any zone at the same it’s reading and writing from other nodes in other zones. This means that your app can use all 9 nodes, instead of just using the nodes in a master zone.

Sounds good, how do I find out more?

ClustrixDB 9 is generally available right now, and the documentation is publically available: http://docs.clustrix.com/display/CLXDOC/Zones

If you just want to read up on the feature, that’s a good place to start.

However, if you have a project that your actively working on and would like to ask questions live with one of our top-notch technical solution engineers, we’d be happy to hear from you. We are very excited about this new deployment capability, so just ping us at sales@clustrix.com.

And since Metro Area Clustering and Availability Zones are such a big deal, stay tuned for more blogs expanding on this topic.