Working with a lot of data like products, orders, categories, users and payments is a very important topic when building e-commerce applications. In this post, you'll learn the very basics of structuring your noSQL schema so it's fast and scalable for e-commerce scenarios.
NoSQL databases like MongoDB are still pretty popular amongst modern app development projects and evolved a lot in the last time. The community released many decent packages that help you working with noSQL databases on a very scalable level - like schema generators and battle-proofed packages for combining it with JS frontends or graphQL APIs.
Besides that, especially MongoDB did an awesome job with providing a very good cloud solution (MongoDB Atlas: https://www.mongodb.com/cloud/atlas) that perfectly fits in a modern web tech stack.
In case you've dropped the idea of using noSQL databases in the last years because you've heard of them not being the right choice for any apps that need to manage a lot of complex & transactional data, you should give them a shot again. Things changed quite a lot. Actually, Reaction Commerce wrote a decent post about that more than 3 years ago and the technology improved even more since then: https://blog.reactioncommerce.com/why-nosql-databases-are-perfect-for-ecommerce/.
To summarise: With MongoDB (and any other noSQL database) you can build safe, reliable, scalable, cloud-based databases that are very fun and easy to code with.
When you build e-commerce experiences you typically have to cater for:
As you can see, you should make sure that your database works for a huge amount of data sets that are connected to each other.
When you work with MongoDB, you create JSON-based entries (called "documents") in your database and group them in so-called "collections".
For example, the collection "products" contains multiple documents while each document contains the data for a product like this one:
// a document in the "products" collection
{
"_id" : ObjectId("5e451f4cd249baf1d045a778"),
"title" : "Hackathon T-Shirt",
"price" : 3.99,
"currency" : "USD",
"description" : "My Awesome Shirt",
"sku": "DEV1337",
"createdAt" : ISODate("2021-01-02T10:05:00.610Z"), "stock" : 12,
"sizes": ["xs","s","m","l","xl","xxl"],
"colors": ["red","green","blue"],
"vendor" : "DEVSHIRTS",
"vendorSlug": "devshirts",
"vendorDescription": "DEVSHIRTS is a fashion label that sells T-Shirts for devs."
}
In this example, I've **embedded** all data for this product in its document. By embedding the data directly to the object I can get the product with all of its properties with a single query like in this example:
const product = db.products.findOne({"sku":"DEV1337"}).fetch();
const productTitle = product.title; // "Hackathon T-Shirt"
const vendor = product.vendor; // "DEVSHIRTS"
const amountSizes = product.sizes.length // 6
Super clean and easy to write, isn't it? However, you should keep in mind that query for multiple products will return the product with all of its properties which can become up to 16MB.
Let's say one vendor has 1,000+ products and you decide to change a vendor's description. Now all 1,000+ products have to be changed with the same operation to change it "vendorDescription" that needs to be changed.
db.products.updateMany({vendor: "DEVSHIRTS"}, {$set: {vendorDescription: "DEVSHIRT's new description"}});
Such operations are easy to code but might need a lot of processing power when they need to be done multiple times per minute or per second in a huge database. One solution for this is referencing the data instead of embedding it into the object.
You probably know this way of working with data from regular SQL-databases like MySQL. Instead of adding all data to the same document, you just add a reference ID to another entry in another collection as I do here for the vendor:
//a document in the "products" collection:
{
"_id" : ObjectId("5e451f4cd249baf1d045a778"),
"title" : "Hackathon T-Shirt",
"price" : 3.99,
"currency" : "USD",
"description" : "My Awesome Shirt",
"sku": "DEV1337",
"createdAt" : ISODate("2021-01-02T10:05:00.610Z"),
"stock" : 12,
"sizes": ["xs","s","m","l","xl","xxl"],
"colors": ["red","green","blue"],
"vendor" : "23ae11d117baf1d127c99efd334"
}
//a document in the "vendors" collection:
{
"_id" : ObjectId("23ae11d117baf1d127c99efd334"),
"title" : "DEVSHIRTS",
"slug": "devshirts",
"description": "DEVSHIRTS is a fashion label that sells T-Shirts for devs."
}
When you query referenced data, you need to write the first query to get the referenced id and a second query to get the object you're looking for:
// Want to get the vendor description of a product...
const product = db.products.findOne({"sku":"DEV1337"}).fetch();
const vendorId = product.vendor; // "23ae11d117baf1d127c99efd334"
const vendor = db.vendors.findOne({"_id":vendorId}).fetch();
const vendorDescription = vendor.description
There's also a way to do this on a database level with the $lookup functionality: https://kb.objectrocket.com/mongo-db/how-to-use-the-lookup-function-in-mongodb-1277
This is a big upside when working with referenced data instead of embedded data: You need to update way fewer documents, probably only one. For example, updating the description of a vendor can be done by a very performant single operation now:
db.products.update({vendor: "DEVSHIRTS"}, {$set: {vendorDescription: "DEVSHIRT's new description"}});
Even if you have thousands of products from this vendor in your database, MongoDB will only need to update one single entry.
To decide whether you should embed or reference your data is one of the most important aspects when building database schemas.
Referencing data might often look like a "clean" way but when you start to create as many references as possible the amount of code and queries you'll need to write will increase tremendously - think of my product example above with references to sizes, colors, currencies, etc.
Besides that, your database will get bombed with queries and operations that might be needed when you'd just embed the data.
On the other hand, too much embedded data could mean that your query and work with objects that are way bigger than actually needed, which also slows down your app.
From my experience, it's best practice to embed as much data as possible and only reference other documents if it's really needed and makes sense for your specific application.
For example, if you only need to attach a few variants with unique attributes to a product, there's no need of creating a "variants" collection for that. On the other hand, if your variants are basically their own products with a lot of attributes (like title, images, SKU, prices, etc.) and even get shared across multiple products, it's a better idea to put those in their own "variants" collection so you can query and modify them independently from the products.
Oftentimes it also helps to select the schema based on the data model. While embedded data is usually fine for "One-To-One" and "One-To-Few" relations, referenced data shines for "One-To-Many" and especially for "One-To-VeryMany" relations.
I hope I gave you a basic understanding of embedded & referenced data and you have an idea about structuring e-commerce data now.
If you want to learn more, MongoDB has quite nice tutorials, guides, and presentations you can have a look at:
Previously published at https://danielkolb.hashnode.dev/nosql-database-design-for-e-commerce-apps-in-2021