Percona recently published a blog discussing MySQL sharding models for SaaS applications. The blog did a great job highlighting the effort it takes to shard MySQL (e.g. gain write-scale for MySQL SaaS deployments), and discussed understanding both customer data requirements, as well as revenue expectations when designing sharding topologies. But another equally important consideration was not examined: the significant application changes needed to support sharding.
Once data is partitioned (i.e. ‘sharded’), there is no longer a RDBMS guaranteeing cross-shard ACID transactions and referential integrity. If your application users need those guarantees, (for instance, if there are financial implications in the database transactions), then the application itself must be re-architected to supply the needed ACID guarantees and referential integrities. Re-architecting requires a lot of time and effort, and presents a high risk-factor: instead of leveraging decades of peer-reviewed top Database Engineers’ vetted code, your own team has to (re)create significant parts of the RDBMS themselves in the application layer. Routinely during this process, we see that “something’s gotta give” and that trade-off is usually sacrificing Availability for Consistency.
As SaaS providers grow, they’re increasingly confronted with this ‘double-whammy’: the need for more scale (especially write-scale), but not having the budgets for SQL Server or Oracle. Trying to stay on MySQL, DBAs/DevOps are unfortunately in the position of having to ‘roll-their-own’ sharding topologies, which adds even more complexity to their deployments. And finally, not only will there by significant application changes required, but the sheer amount of application changes required to enable sharding is burdensome. My advice? Go with a shared-nothing architecture that accommodates true distributed processing without sharding, like ClustrixDB.