Are you trying to decide if you should use MongoDB or DocumentDB? With the recent controversy surrounding licensing with MongoDB, it can be confusing to decide which option is right for your company or project. Amazon decided the core MongoDB code is challenging to scale while remaining highly available. Amazon wrote their implementation, which is compatible with the Apache 2.0 open source MongoDB 3.6 API. MongoDB, Inc. has also recently changed their license to make future imitations more difficult. To do this, they created an entirely new license called the Server Side Public License.
Given the split, should you host your mission-critical database with a MongoDB service provider, such as MongoDB Atlas, run and scale your instance, or use DocumentDB? With the schism between the companies and the fact that there is no clear answer, this becomes a difficult decision.
What Is DocumentDB?
Launched in January 2019, Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable, and highly available implementation of the MongoDB 3.6 API. It's a fully managed document database that supports MongoDB workloads. Essentially, DocumentDB is a clone of version 3.6, reimplemented for scale. It does not share or use any of the MongoDB source code, meaning it's a unique implementation, proprietary to Amazon. The code is closed-source. Amazon felt they needed to create their implementation to improve performance on large data sets, with high throughput for mission-critical workloads.
To do this, they've decoupled storage and compute. This allows the read capacity to scale to millions of requests per second by adding low latency read replicas.
Amazon's implementation is a response to a growing set of customers struggling to run MongoDB at scale. Amazon felt that all existing solutions, including MongoDB Atlas, didn't solve the problems their customers were facing.
For example, DocumentDB supports automatic data scaling, which allows you to scale from 10GB all the way to 64TB without any effort. Before DocumentDB, this sort of data scaling was difficult.
Amazon's solution also provides automatic fault tolerance. It automatically divides your storage volume into 10GB segments spread across many disks. Each 10GB chunk of your storage volume is replicated six ways, across three Availability Zones.
Amazon DocumentDB is designed to seamlessly handle the loss of up to two copies of data without affecting write availability, and can also handle the loss of up to three copies without affecting read availability. It also features self-healing storage volume. Data blocks and disks are continuously scanned for errors and repaired automatically.
Since the service is hosted by Amazon, you are covered for most compliance. DocumentDB complies with many standards including, PCI DSS, ISO 9001, 27001, 27017, 27017, SOC 1, SOC 2, SOC 3, and is eligible for HIPAA.
That Sounds Great! What's the Catch?
To be compatible with the 3.6 API, DocumentDB emulates the requests and responses that a MongoDB client expects. In theory, any driver that's compatible with 3.4+ will work. That said, there are quite a few caveats to this statement that don't show up in Amazon's marketing material. There are both gaps in the API support, and critical functional differences to consider.
According to MongoDB, Amazon failed 61% of its correctness tests. Some of the biggest gaps to consider:
1. Aggregation pipeline stages and query language operators are severely hampered. As of the time of this writing, only 50% were supported. For example,
mapReduce
is not supported. With larger data sets, this is more likely to be an issue. See the full list of aggregation support at https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html#mongo-apis-dababase-aggregation.
2. Certain data types and indices are limited. The Decimal128 datatype and case insensitive indices are not supported.
3. There is no change stream support. Large scale applications would likely benefit from this feature. It's unclear if this feature will make it into DocumentDB, given their implementation. Be sure to check your code for anything that uses change streams before using DocumentDB.
As an example, this Java code will fail:
MongoCursor<ChangeStreamDocument<Document>> cursor = inventory.watch().iterator();
ChangeStreamDocument<Document> next = cursor.next();
4. Tunable consistency is also an issue. Since DocumentDB is fundamentally different and is oriented around scale, it's unlikely you'll be able to control how consistency is managed for your database instance. Users are at the mercy of how Amazon decides to implement consistency guarantees and will have to make changes if Amazon changes its implementation.
Make sure to understand the full list of compatibility issues here before making your decision. Amazon has stated that they will continue to support most of the 3.6 API but has not given any indication as to when they will stop that support.
It is possible you could be stuck waiting a long time for critical features.
The biggest downside to choosing DocumentDB is being stuck on version 3.6. Because of the difference of opinion between Amazon and MongoDB, as well as the new licensing scheme, it's unclear what will happen to DocumentDB in the future.
MongoDB does not support Amazon's implementation and plans to fight it. Amazon hasn't announced any plans to support more recent versions, so MongoDB is now a significant version ahead (4.0+), with some great new features being released that are not offered in DocumentDB. It's unknown if Amazon will create a new API that diverges from MongoDB's mainline.
If this happens, users will be locked into an awkward forked version of MongoDB that only Amazon supports. The other option is to stay on DocumentDB 3.6 forever, which will cause users to miss out on great new MongoDB features. Choosing DocumentDB could lead to unintended vendor lock-in and a difficult transition back to MongoDB down the line. This has not yet been addressed by Amazon, so only will tell what happens.
An example of a great feature released in version 4.0 of MongoDB is multi-statement ACID transactions – this is unlikely to be supported by DocumentDB, especially with the distributed implementation that separates storage and compute.
As an example, this Java code will fail on DocumentDB:
try (ClientSession clientSession = client.startSession()) {
clientSession.startTransaction();
collection.insertOne(clientSession, docOne);
collection.insertOne(clientSession, docTwo);
clientSession.commitTransaction();
}
This is likely the first many great features that MongoDB will release over the next few years. DocumentDB users will not have access to these features, and it is unknown if they will release similar features of their own.
So Why Would I Use AWS DocumentDB?
Ultimately it all comes down to scale. If you need or have the potential to scale multiple terabytes with hundreds of thousands of reads and write per second, DocumentDB could be a good fit. This sentence from their press release says it all, "Together with optimizations like advanced query processing, connection pooling, and optimized recovery and rebuild, Amazon DocumentDB achieves twice the throughput of currently available MongoDB solutions."
Another reason for choosing DocumentDB is keeping everything in AWS. It could be beneficial and convenient to have most, or all, of your services on AWS. DocumentDB is Amazon's only managed MongoDB compatible service. The alternative would be managing databases yourself on EC2/EBS, which is challenging to do. Managing your own database instance these days is usually not a good use of time, unless you have a dedicated team and a particular use case.
Amazon has been in the cloud services business for a long time. They are arguably the best in the business, so you can trust them to secure and scale your data.
Other providers don't have the same track record or experience. If you need those sorts of guarantees, pick DocumentDB, otherwise, stick to MongoDB.