Understanding Azure Cosmos DB – DocumentDB vNext and more
One of Build 2017’s most interesting launches was Cosmos DB – a new NoSQL database for Azure. It replaces the existing DocumentDB offering and augments it with new features. From a competitive perspective, it continues to be positioned against Amazon’s DynamoDB. Let’s look at what it is.
Cosmos is a piece of distributed database technology originally built for Microsoft’s internal use. Hence, it is the data backend for a whole lot of Microsoft’s own services. DocumentDB, introduced in 2014, was a slice of its features – namely, a database designed for storing JSON documents. DocumentDB also featured a SQL-like query syntax that made it easy to manage. Later on, it added support for MongoDB APIs, making it even useful even for software not specifically built for Azure.
Now Microsoft is making even more of Cosmos’ features available for its customers. It is a superset of DocumentDB, and all existing DocumentDB instances are now Cosmos DB instances.
From documents to multiple data formats
The DocumentDB data atom is a JSON document. Cosmos DB broadens the support to include two additional data shapes: graphs and tables.
Graphs are composed of nodes combined with edges – a very real-wordly representation of data. They are queried using a common open source query language called Gremlin. It’s query syntax is geared at navigating graphs – you could say e.g. .has(‘person’, ‘name’, ‘Thomas’).outE(‘Knows’) to find people who Thomas knows.
Tables, on the other hand, provide a non-relational tabular data model. While the Cosmos DB is a separate data engine, the API and the data model used is the same as it is with Azure Table Storage. This means that you can port your current Table Storage driven apps to Cosmos DB with relative ease – just change the connection string (and migrate your data, which is another discussion entirely).
If you’re now using Table Storage and completely satisfied with it, perhaps you don’t want to think about migration. But if you care about better SLAs, throughput guarantees, more granular geodistribution control and automated indexing for arbitrary queries, Cosmos might be just the thing for you.
When you create a new instance of Cosmos DB, you choose one of the data models and APIs to be used. That configures the portal experience to match the data model you’ve chosen, but it doesn’t technically prevent you from using the other data models as well. In practice, you won’t probably want to use the same instance for different data models. As for people using the existing DocumentDB offering, your old instance has now just turned into a Cosmos DB instance with the predefined API set (DocumentDB API or Mongo API, whichever you originally chose).
There is even a simple – and somewhat buggy – graph visualizer/editor in the Azure Portal.
Cosmos DB provides you with the option of five different consistency levels: Strong, Bounded Staleness, Session, Consistent Prefix and Eventual. While the Strong consistency level approaches a relational database in its guarantees, the Eventual level only promises your data will be synced at some point. The weaker consistency you choose, the better performance you get with the same amount of resources. DocumentDB already had four of these models – Cosmos DB just adds the fifth, Consistent Prefix.
Whether configurable consistency is interesting depends on your application. If you just throw in some rows, nodes or documents for a low-usage application, it’s not likely to matter at all. On the other hand, if you need to maximize the throughput or have or a well-defined concurrency behavior, this option is definitely your friend.
An impressive SLA included
One of the biggest changes from DocumentDB days is the addition of new SLAs. Most importantly, they demonstrate Microsoft’s incredibly bullish stance on what kind of beating Cosmos DB is intended to take. Cosmos DB keeps DocumentDB’s 99.99 % uptime SLA. If you’re currently using Azure Table Storage and move your data over to Cosmos, that’s the addition of one nine.
For performance, Cosmos DB promises that 99.99 % of all read requests are going to execute in under 10 ms, with a 15 ms guarantee for writes. Again, this is a significant change from Table Storage. Table Storage is usually extremely fast when using queries that hit the indexes, but there are no guarantees of that speed. Cosmos changes this by having a well-defined performance contract and resource governance.
Furthermore, Cosmos DB promises to be able to successfully process 99.99 % of the inbound requests and honor the requested consistency level at 100 %.
Controlling the scale and pricing
The scalability controls for Cosmos DB are similar to DocumentDB’s. You can separately control the number of those mysterious Resource Units per second, which is the measure of query/write performance (see definition of RU here). 100 RU/s costs roughly 6 USD/month, but the minimum commitment is 400 RUs, leading to the minimum cost of ~25 USD/month. Also, you need to pay $0.25/GB/month for the storage you use.
In addition to the previous DocumentDB pricing, Cosmos DB adds another feature. If you want to, you can support your occasional peak loads by provisioning RUs per minute. What this means that in addition to your guaranteed per-second-performance you would get a per-minute quota to help you handle situations where your load may peak occasionally. The driver for you is the price: per-minute RUs are cheaper than provisioning the required amount of per-second RUs capacity. For more details on how per-minute provisioning works, check out the documentation.
DocumentDB has been a well-liked service of Azure. Cosmos DB builds on it, and also adds several missed features. Dharma Shukla, one of the key designers of the new database, said he “wants this […] to last for many decades to come”. Whether or not that wish will come true, at least the initial reception looks positive.