MySQL Sharding DevOps Challenges: Data Maintenance, Infrastructure, & HA
Previously we’ve discussed application and design challenges for MySQL sharding, and some of the corresponding business challenges that can result, affecting your business flexibility. But what about MySQL sharding DevOps challenges?
For reference, here’s a quick précis about MySQL sharding: MySQL sharding is the strategy of partitioning your MySQL application workload across multiple different MySQL database servers, allowing queries and data CRUD operations to fan out. This works around MySQL’s single write-master architecture, providing the ability to scale out both writes and reads, albeit with tradeoffs. This is a big DevOps project.
And since we’ve already looked at the “front of the house” with MySQL sharding’s challenges to business rule support, let’s take a look at the “back of the house”: the MySQL sharding DevOps challenges.
MySQL Sharding DevOps Challenges, #1: Data Maintenance
Once the sharding key has been chosen, the data needs to be physically distributed across the array of MySQL servers. Each of those servers will require its own database(s) for its own set of data partition(s). The initial data distribution can be manual; that’s more a “one-time setup.” However, what happens as your workload grows? Ideally each shard grows equally, but sometimes life happens:
Shard Growth and Hotspots
There are two main ongoing data maintenance capacity challenges to a MySQL sharded array:
- Shard Growth
- Shard Hotspots
Shard growth means one or more of your shards is poised to exceed the storage capacity of the underlying server. Hotspots means that one or more of your shard servers is experiencing contention, e.g., of CPU or network traffic, even though it’s not near storage capacity. Both shard growth and hotspots result in degraded server performance, and both have a similar solution: split the local shard(s), and move (e.g., half) of the data to a different MySQL server. This poses significant work for DevOps.
Shard growth is positive from a business perspective;growth is good. However that growth represents some DevOps challenges, because it means that further data distribution is necessary. In short, each MySQL server needs sufficient “spare room” to grow, else transactions will start to slow, if not fail. Best practice is to have at least 40% storage free, and average CPU utilization between 60 and 70%. Once the storage exceeds 90%, this will have an effect on MySQL performance. Either the server(s) need to have their disks and/or SAN upgraded, or the local shard needs to be split, with the new “half” of the shard moved to a different MySQL server. Ideally the shard is moved to a new server to maximize growth potential, but sometimes CAPEX budgets dictate consolidation. In that case of “doubling up” shards, it’s critical to ensure that the new split shard doesn’t create contention on its new shared server, or you’ll have to move it again. This can easily create some significant MySQL sharding DevOps challenges.
Shard hotspots mean access/data skew. At best this is temporary, and resolves itself over time. At worst it means the MySQL sharding key (strategy) has issues, and may need revisiting. That would entail resharding the entire data workload, representing either significant downtime or significant redundant hardware outlay. Whereas shard growth exhausts local storage, shard hotspots create network, CPU, and/or potentially storage contention. Hotspots aren’t an issue of storage capacity; there could be lots of disk space left. However, database usage patterns are driven by what data is resident on the local server, so the most straightforward way to handle that kind of hotspot is to further partition the local shard, which means even more DevOps challenges created by MySQL sharding
Managing shard splits can be tricky. The easiest solution would be for DevOps to take the shard offline, split, move the new half shard to the new server, update the shard:server mapping LUT, and then bring the offline shards back online. This would entail the application (gracefully) failing for all transactions on the offline shards. From a business perspective, that means bringing that subset of customers, features, or functionality offline. For example, some large gaming companies have periodic maintenance, and all customers (thousands) on the shard being modified go offline for hour(s).
But for high-volume/high-value MySQL applications requiring high availability, since each shard is redundant (e.g., each has a slave for HA), shard changes can be done on the slave, while production isn’t affected. Once the shard has been split and moved, then replication is needed to catch up the slave to the master’s position. And when ready, promote the slave, demote the master, and cut over. This trades a minimum of downtime for a lot more DevOps effort.
Once the shard has been split, and half of that workload is moved with the new shard to a different (ideally new) node, the original local server now containing half the data should see a 50% fall in workload. If it doesn’t, then the shard-split process might need to be repeated. This is especially possible if the newly created shard is instead moved to (share) another in-place server, e.g., due to budget constraints. In that case, careful review of the new usage pattern for that server needs to occur, or a scenario of “robbing Peter to pay Paul” might occur.
Shard merges can periodically be useful. What happens if the business offerings changes? Or there is peak/seasonality in user access patterns, for example Black Friday, Prime Day, or Single’s Day? The MySQL sharded array that handles 3x to 5x higher visitors at a seasonal peak is severely over capacity the rest of the year. But it turns out, since shard merges require a similar amount of effort required by shard splits, many enterprise MySQL sharded deployments don’t bother. In that case, any subsequent shard splits get moved to share currently used MySQL servers, rather than deploying new ones.
MySQL Sharding DevOps Challenges, #2: Infrastructure Maintenance
The other side of the “data maintenance” coin is infrastructure. Whether it’s shard growth, hotspots, splits, or shard:server mapping, each of these requires DevOps to deploy, upgrade, maintain, backup, and retire/replace servers. Some of those tasks can be made easier on the cloud, especially speed of deployment, but they still need to be managed, and still can represent a challenge to sharded MySQL applications and lots of MySQL sharding DevOps challenges.
Specifically, there are three main infrastructure challenges to sharded MySQL applications:
- Server logistics
- Server backups
- High availability (HA)
Server logistics of MySQL sharded arrays fall into three rough groups: the MySQL servers themselves (potentially redundant for HA), the shard:server mapping, and any replication strategies needed across the array, e.g., to avoid cross-node transactions. The MySQL servers are straightforward; purchase (or rent) the instance size with the best price/performance. But as sharded arrays grow large, the nodes begin to be at various stages of life. Some kind of periodic background heartbeat and/or smoke tests can be useful to determine which servers are falling behind in performance, and due to be replaced. And all of this doubles due to HA. With MySQL, HA is usually accomplished with a slave instance, which is a full copy of the primary/master instance. So that’s twice the number of MySQL servers required, both to be purchased/rented (CAPEX), and administrated (OPEX). In short- lots more work for DevOps.
Shard:Server mapping is critical for a MySQL sharded array. The application must always know which MySQL server contains the data per transaction. Typically this kind of mapping is done with Redis or Memcache, i.e., fast in-memory key/value stores which provide LUT (look-up table) intersections between PK, sharding key, database name, server id, server IP, etc. This allows the application to do dynamic lookups with the least impact on in-flight transactions. This also requires additional LUT/mapping servers deployed and maintained. And they really should be redundant; without this data, the sharded array is crippled.
Cross-node replication is necessary for any data that is needed on the local shards. This allows transactions on a local shard to avoid cross-node transactions, and the significant application modifications needed to provide referential integrity and ACID guarantees. If cross-node replication is required, this adds a whole new layer of complexity for DevOps, ensuring that slave processes are created on each node, setting up binlogs on the requisite masters, and monitoring/ensuring replication lag is within reasonable bounds.
The challenge of backing up a MySQL sharded array
Since there is no single RDBMS in charge of all the MySQL servers in the array, there is no programmatic way to get a consistent backup. Backups can be initiated with MVCC, ensuring a consistent transactional state per server, but even with NTP (or even local atomic clocks) the local server time will not exactly match the other servers. This is less of an issue if each server operates completely separately; if the business rules strictly avoid cross-node transactions, then perhaps “close enough” is sufficient. However, after talking to many DevOps leads at many companies, they all tend to be unhappy with their backup strategies. Options include using block storage volumes, and do a simultaneous block copy at a single moment in time. Some cloud providers also have snapshot backup options in their hosted offerings, but simultaneous snapshots across each node in the array still need to be synchronized. Similarly, recovering node(s) from backup can be tricky, requiring using replication to roll forward to match the other node(s) transactional state. And finally, synchronizing all the MySQL shards again can sometimes require rolling reboots across the shards. Talk about MySQL sharding DevOps challenges!
All this complexity leads some deployments to focus more on HA than on ensuring consistent backups.
MySQL Sharding DevOps Challenges, #3: High Availability (HA)
Finally, let’s discuss MySQL sharding challenges related to high availability (HA). In general, MySQL applications requiring the degree of scale that sharding provides, provide high-volume/high-value transactions that need to be highly available.
This means DevOps needs to make sure each of the MySQL servers are fully redundant, i.e., each “shard” will actually consist of at least two servers, in either master/slave or master/master configuration. Master/slave is the easiest to setup, but provides no guarantees of transactional consistency. Master/master, specifically certification replication-based, guarantees that the secondary master has a copy of the transaction information before the primary master COMMITs. This guarantees that if the primary suffers an outage after it COMMITs but before the secondary COMMITs, the secondary still can complete that transaction and honor the ACK sent back to the application by the primary. As expected, this level of transactionally consistent HA is much more involved to setup than regular master/slave asynchronous replication; trading OPEX for a much higher level of HA. In short, creating more than a few additional MySQL sharding DevOps challenges.
Why automatic snapshots aren’t sufficient for HA
AWS RDS automatically provides snapshot backups at five-minute intervals. At first glance, this would seem a lot simpler than deploying redundant servers for HA, not to mention setting up and maintaining all the replication needed (e.g., between different RDS masters, as distinct from the read replicas already provided in RDS).
However, five minutes latency means five minutes of lost transactions. This is different than losing “in-flight” transactions; if servers experience an outage, “in-flight” transactions are often, if not typically, lost with MySQL. Since those transactions didn’t complete, the data updates weren’t given a COMMIT, and an ACK wasn’t sent to the application. But if a transaction completes, e.g., the order is taken, and the customer has received an order confirmation number, then the customer has a reasonable expectation that transaction will persist if there’s a subsequent server outage. And if the customer has had their payment method decremented, they can have legal recourse, depending on legal jurisdiction. But without HA, that e-commerce provider will have no record of that transaction, even though the credit card company does.
This is the very kind of boondoggle which creates bad press, tainted branding, etc., and falls back onto DevOps and its MySQL sharding.
And one more gotcha, if in your cloud deployment you haven’t rented Reserved instances, that means that five-minute backup latency is extended significantly longer while a new instance is brought online.
MySQL Sharding DevOps Challenges Summary
While sharding MySQL is certainly a well-accepted strategy to address the scalability needs of MySQL applications, it needs to be approached with eyes wide open. Both architecture and logistics planning needs to be created and reviewed, to help avoid the MySQL sharding DevOps challenges you may be causing to your OPEX and CAPEX budgets. Sharding MySQL is always hard, and its tradeoffs can be myriad, especially the knock-on effects as your DevOps team maintains the sharded array going-forward. It’s easy to make decisions up-front for which DevOps eventually will be paying the prices in effort and resources, resulting in your MySQL application potentially losing credibility.
ClustrixDB was built specifically to avoid the MySQL sharding DevOps challenges. A distributed MySQL-compatible ACID-compliant relational database, ClustrixDB linearly scales-out both write and read transactions both in the cloud and in your datacenter, requiring a minimum of DBA and DevOps attention.