Amazon’s Aurora Announcement
Last week, Amazon announced Aurora with many interesting features promised. Let’s drill into the details a bit, and explore the ramifications.
Aurora’s overall design leverages a single write master, read-slave fanout architecture. Okay, sounds like MySQL—reads are scalable. Amazon has also written a lot about Aurora’s storage features, but the architecture details remain very scarce. Perhaps they have leveraged AWS DynamoDB using SSDs as a backend store transparent to the database engine. With scale-out reads leveraging shared-disk on AWS infrastructure, Aurora will only work on AWS.
By allowing 15 Replicas (read-slaves), Aurora could have good read scalability for many workloads. However, the details on write scaling are not yet known. Ultimately they will be bound by write throughput on the master. There’s no mention of in-memory handling of writes, so writes will be limited by the storage infrastructure, which recommends using SSDs, but across multiple Availability Groups, so there’s that latency question again.
Amazon promises to provide both multiple ‘Replicas’ (read-slaves), and ‘incremental backups with point-in-time restore.’ However, there are no stated latencies for promotion of a replica to write-master; so if you lose your write master, there will be an (application) delay before you can resume writing. Correspondingly, the point-in-time restoration has a significant delay: “…restore your database to any second during your retention period, up to the last five minutes.” Therefore all transactions for the last five minutes will be lost if your instance goes down? Aurora tries to mitigate this huge delay by promising to fail-over to additional Replicas, as well as ‘Reserved Instance’ pricing—‘hot swap’ hardware comes to the cloud!
Latency is an interesting issue. They’ve stated that their replication is asynchronous. That’s not surprising since otherwise they would see huge write latencies on the master. They claim millisecond lag – how that’s handled isn’t exactly clear. If DynamoDB is a shared disk-like storage for them, how do they handle cache consistency on the slaves? Another interesting open question.
Amazon promises “up to 5x speed of MySQL v5.6 on equivalent hardware.” That sounds great, but can we see the numbers? And the details on the specific sysbench tests which were run? As we all know “up to” has a bit of wiggle room. Unfortunately, Aurora is currently on ‘Limited Preview,’ so we’ll all have to wait.
With multiple ‘quorum’ writes and multiple read (slaves) to keep in sync, what is the latency for full slave synchronization? For example, if the application is reading from a replica/read-slave, and the primary (master) is updated, how is that transaction resolved? A customer could have added an item to their shopping cart on an e-commerce site, and once they query their cart (e.g., on the read-slave), they may choose to change the quantity or color. If customer2 has just purchased all the available items of that new color, will customer1’s session reflect the newly updated available-to-promise quantity? Or will that be something for the application to handle. These kinds of questions are critical for an e-commerce deployment.
Overall, Amazon’s Aurora announcement gives a lot of promises of “speed and reliability of high-end databases,” but not a lot of description of what’s under the hood. Specifically, there’s a lot of exposure to various latencies; will those affect transaction concurrency and application availability? As a reference, NoSQL databases can provide tremendous speed and high-availability; they just have to ‘leave behind’ transaction support in order to do it.
Is Aurora a glorified NoSQL database engine with some SQL parsing bolted-on top? Or is it a fully-qualified NewSQL database engine with fully ACID Compliant MPP transactions and MVCC functionality?
Only time will tell, but one thing is for sure…
ClustrixDB is a fully ACID-compliant, MVCC-enabled database which is already in deployment on premise, in the cloud at AWS, Rackspace and others in hundreds of instances worldwide, with 1 Trillion+ transactions per month.