Tech Talk Series, Part 4 Wrap Up: Achieving High Availability in MySQL

tech talk series part 4 wrap up featured image

 

Last week we presented the final Tech Talk in our Scaling MySQL series titled “How Do You Achieve High Availability in a MySQL Environment?

When discussing high availability, the following questions usually come up first: Do you need it? Why?

Great questions. If your system is a transactional one, with high value/high volume transactions that have significant financial impact, then downtime and slowdown are not acceptable and your application needs to be protected against these issues. These kinds of workloads include e-commerce, financial/trading (including ad tech and martech automatic ad-purchasing systems), real-time decision support systems, and healthcare. If these systems suffer outages, people’s jobs or even lives can be on the line.

This begs the question of whether or not these systems should be fault tolerant, i.e., designed to handle multiple simultaneous failures without any downtime at all? Fault tolerant systems exist, but aren’t usually deployed in MySQL architectures due to their very high cost. Instead, HA in MySQL usually focuses on “minimizing downtime,” with reliability targeting 5x 9s, or 5 minutes of downtime per year.

 

Figure: Availability by the 9s

 

How is HA accomplished with MySQL? Typically via replication strategies.

Replication for Highly Available MySQL Systems

HA in the world of MySQL is usually replication-based, leveraging redundant servers.

This means for every production server, there is at least one additional server, containing the same data, and ready to be switched to if the primary goes offline.

The typical method for this is via replication, specifically:

  • Read Slaves: Add one or more read slaves to the master via MySQL replication. The master could be a single node, or a shard.
  • Master/Master: Add a secondary master to the master via MySQL replication. The master could be a single node, or a shard.
  • Certification Replication: Add one or more masters to the master via certification replication. The master could be a single node, or a shard.
  • Group Replication: Add one or more masters to the master via group replication. The master could be a single node, or a shard.

Each of these replication strategies has an associated (balance) between latency, performance, and consistency. What level of consistency is necessary between nodes in your workload? And does that have an effect on performance?

MySQL HA Approaches Per Deployment Architecture

Here is a grid summarizing each of the typical MySQL deployments (single node, read slaves, master/master, and sharding), and the ramifications of the four main replication approaches. This summary should help you decide which is best for your workload:

 

 

ClustrixDB High Availability and Linear Scale-Out Performance

ClustrixDB is a shared-nothing, fully distributed MySQL-compatible clustered database, providing high availability by default.

How does this work?

  1. There are by default two complete instances of the data distributed across the cluster.
  2. All nodes provide linear read and write scale.
  3. All nodes are strongly consistent with each other at the transactional level.
  4. By default, you can lose a single node and the system will lose no data.
  5. You can add additional server redundancy by setting the nResiliency level, up to the Paxos limit. (You can lose up to one less than half the nodes in the cluster, to avoid “split brain.”)

In short, ClustrixDB takes a different approach to data replication between its cluster’s nodes. Focusing on ensuring both strong consistency as well as maximum throughput, ClustrixDB doesn’t use a replication stack at all (no binlog/relay log architecture). Instead, ClustrixDB uses synchronous COMMIT across all participating nodes. All nodes participating in a write must provide a synchronous acknowledgment before a write is complete. And writes are performed in parallel.

And ClustrixDB does all this with very high performance, linearly scaling out with each added node.

Click here to see a full recording of the final installment of the series.