E-commerce is capturing an ever-increasing market share of shoppers. In addition to the requirements of managing a physical store (like tracking inventory), e-commerce stores have additional challenges, which include getting the product to the customer, using a payment processor, and building relationships with customers. All of these tasks require specialized infrastructure, which in turn interact with your database. A lot of your store’s efficiency and effectiveness rests on the decisions you make about structuring and storing data. In this article, we’ll explore the pros and cons of a NoSQL database for your e-commerce application and use the most popular document store option, MongoDB, in our examples. Note: This data modeling exercise is a response to the Hacker News comments about a previous post we published about building a scalable e-commerce data model. NoSQL Databases and MongoDB MongoDB supports (atomicity, consistency, isolation, and durability) transactions, and its cloud offering is (Payment Card Industry Data Security Standard) compliant. It's one of the best NoSQL options for e-commerce. ACID PCI DSS MongoDB also saves documents in a JSON-like format, which means it's very simple to convert queries and results into a format your frontend code understands. This removes the need to write object-relation mappers (ORMs). Further, MongoDB scales extremely well; it's to be distributed across multiple machines. NoSQL databases don't come with enforced schemas. Introducing new fields is fast because the database doesn't need to be told you're going to change the format or even the existence of a field. It simply saves what you tell it, no fuss. designed While this is great for growing businesses where products and checkout workflows evolve rapidly, it does require lots of care to ensure you don't create data that will confuse future developers and application versions. Before starting to model your system, be aware that MongoDB doesn't start with access control. Don't let this default catch you by surprise. before you write real data! PCI compliance requires that some cardholder information is encrypted at rest and in motion. You can enable Transport Layer Security/Secure Sockets Layer (TLS/SSL) to secure network traffic. Enable access control encryption We'll cover at-rest encryption later in the article, but before going live with any site using MongoDB, go over the to make sure you're up to date on best practices. security checklist Data Modeling a Digital Storefront with NoSQL There’s an almost endless array of features you can build out for an online store, but for the purposes of this article, I’ll focus on modeling just a few: your product catalog, user information, payments, and orders. Product Catalog In order to sell things, you need to record what you have so clients can browse before committing to a purchase. The bare essentials for a product document are the and a unique identifier like a . It's also common to include a description, the quantity you have available, and an image file. price sku All examples are run from in the and should be adopted to a driver for your application. Mongo shell ( { : , : , : , : { : , : , : }, : , : [ , ], : } ) db .inventory .insertOne item "journal" price 9.99 qty 25 size h 14 l 21 w 1 features "Beautiful, handmade journal." categories "writing" "bestseller" image "items/journal.jpg" This will return a verification message like this: { : , : ObjectId( ) } "acknowledged" true "insertedId" "600e814359ba901629a14e13" If you don't supply MongoDB with an field, it creates one using an . The field is a unique, immutable, indexed identifier. To make your example more lifelike and show the flexibility of MongoDB, use the command and the returned to accommodate two related products with slightly different features, and throw in a promotional offer. _id ObjectId _id updateOne _id ( { : ObjectId( ) }, { $ : { : , : , : , : }, $ : { : , : [ , ], : [ { : , : { : NumberDecimal( ), : }, : , : { : { : , : , : }, : [ ], : [ , ], : , : } }, { : , : { : NumberDecimal( ), : , : NumberDecimal( ) }, : , : { : { : , : , : }, : [ ], : [ ], : , : } } ] } } ) db .inventory .updateOne _id "600e814359ba901629a14e13" unset image 1 size 1 qty 1 price 1 set item "journal" features "Inner pocket" "Durable cover" skus sku "154A" price base 9.99 currency "USD" quantity 20 options size h 14 l 21 w 1 features "72 sheets of premium lined paper" colors "brown" "red" ruling "wide" image "images/journal1.jpg" sku "154B" price base 14.99 currency "USD" discount 4.00 quantity 15 options size h 18 l 22 w 2 features "140 sheets of premium paper" colors "brown" ruling "unlined" image "images/journals.jpg" Notice that we changed the structure of the field from a string to an array. This highlights the powerful (and dangerous) flexibility of NoSQL databases—you can radically change how data is formatted without a peep from the database. In that regard, command-line SQL is scary, but command-line MongoDB is terrifying. features We also settled on a format for . Choose one and stick to it throughout your application. monetary data You can get a head start on product modeling by scouting the data required by major providers like and . You can also check out Fabric's to learn how an e-commerce platform like Fabric handles product modeling. Google Marketplace Amazon Marketplace product information management (PIM) system Query Design Practices The goal with MongoDB is to structure your schema so your application returns all the data it needs in a single read query. Adding complexity in MongoDB often involves adding to a record. is important for efficiently scanning for information. Without an index, MongoDB must perform a collection scan, examining every document in a collection before returning the result. Index the fields you'll use to regularly search for products. embeddable documents Indexing Users Once you have an idea of what products you want to offer, think about how to model your users. You want to store information that will allow you to build an ongoing relationship with a customer, such as being able to display items relevant to them. If you’re selling physical products, every customer must have contact information. Here's a simple user profile for a store that needs to ship a physical product. Because you want to enforce unique email addresses for users and use that field to look them up, you'll index that field and make it unique. You could accomplish this with a command like the following: ( { : }, { : true } ) db .customers .createIndex email 1 unique But this example will use the built in field instead. Remember that the field is indexed, unique, and immutable. If a user changes their email, you’ll want to revalidate it. In this case, immutability is an asset; if a user wants to update their email address, MongoDB will demand a new collection entry, and you can be sure that your field gets reset to false. _id _id emailVerified ( { : , : , : , : , : false, : { : , : , : , : , : , : } } ) db .customers .insertOne _id "journalfanatic@e/mail.com" fname "Journal" lname "Fanatic" hashedAndSaltedPassword "$1$O3JMY.Tw$AdLnLjQ/5jXF9.MTp3gHv/" emailVerified address country "United States" street1 "99 Main Street" street2 "Apt #3" city "Boston" state "MA" zip "74586" Customer profiles are a tricky part of e-commerce because they represent people. Unlike products, people are out of your control and constantly changing. They change addresses, want packages delivered to work, and forget their passwords. This is when you have to start thinking about protecting personally identifiable information (PII) and start working toward . Becoming PCI compliant is a big undertaking and often one of the compelling cases for using a third-party provider that is already compliant. Payment Card Industry (PCI) compliance Let's update our simple example to start handling these issues. First, you'll update the zip code, then add a new delivery address and a date modified field. ( { : }, { $ : { : , : { : , : , : , : , : } }, $ : { : new Date() } } ) db .customers .updateOne _id "journalfanatic@e/mail.com" set "address.zip" "60601" shippingAddress street1 "50 Work Street" street2 "Floor 16" city "Chicago" state "IL" zip "60601" setOnInsert dateModified Next, help your model accommodate a password reset request that has an expiration for thirty minutes from now (and update the field). For performance reasons, you need to be sure that embedded documents don't infinitely expand over time. dateModified If a user resets their password a hundred times, and each time it's added to the document, pulling information about that customer for a simple action (like a login) will be very resource-intensive. It's better to separate out events that can be repeated many times, like orders, site visits, and password resets into their own collection, and use to link other collections. customer document references To make a reset collection: ({ : , token: , status: , expiration: new (ISODate(). () + ( * * )), requestTime: new () } ) db .reset .insertOne customerId "journalfanatic@e/mail.com" "493654" "requested" Date getTime 1000 60 30 Date These tables can get much more complicated very quickly, especially once you start tracking user sessions, presenting customized content, or selling products to teams of users and want to track utilization and distribution of resources. Validation Validation not be necessary for your product models, but it's absolutely a requirement for customer-related fields. Addresses, phone numbers, emails, and credit card numbers have to be validated to ensure you're not opening your site to malicious users and to prevent mistakes that stop customers from obtaining your products. might MongoDB offers tools for at the collection level. You'll also want to implement application-level validation. Validating uniqueness in MongoDB is only possible by creating an index and requiring it to be unique. A collection already exists in our example, so use to add some validation. validating schema customers runCommand ( { : , validator: { $jsonSchema: { bsonType: , required: [ , ], properties: { _id: { bsonType: , pattern: , description: }, : { : , minLength: , description: }, : { : , required: [ , ], properties: { zip: { pattern: , description: }, : { : , description: }, : { : , description: } }, : " , " } } } }, } ) db .runCommand collMod "customers" "object" "hashedAndSaltedPassword" "address" "string" '.+\@.+\..+' "must match the email pattern" hashedAndSaltedPassword bsonType "string" 10 "must be a string ten characters long, and is required" address bsonType "object" "street1" "zip" "^[0-9]{5}(?:-[0-9]{4})?$" "must match pattern and is required" street1 bsonType "string" "must be a string and is required" street2 bsonType "string" "must be a string" description must be an object and is required As you can see, writing validation in MongoDB is cumbersome. We only validated six fields, and the code took twice as many lines as creating the record. Libraries like can make validation easier, while headless commerce APIs from third parties like Fabric] will handle data validation for you. Mongoose No matter how you decide to implement validation, it's vital for the long-term health of your application that you audit your data and create solid schema and application-level validation for any user-entered fields. Payments Modeling your payments is where the is most important. It's also the first place where you really need to reference another table. security of your application It can be tempting to include payments in the customer table by adding a nested document, but it's not wise. Payment information has specific and extremely important security concerns; it's simpler to audit your PCI compliance practices if payments are not mixed in with a lot of other data. Let's start with a naive (and dangerous!) implementation, then discuss some of the security implications that can improve the model. ( { : , : , : , : , : NumberDecimal( ), : { : , : , : , : , : } } ) db .payments .insertOne customerId "journalfanatic@e/mail.com" status "verified" gateway "stripe" type "credit" amount 1.00 card brand "Visa" pan "4242424242424242" expirationMonth 1 expirationYear 2090 cvv 123 This is a starting point, but e-commerce experts will panic seeing this schema. All the information necessary to use this card is in your database in plaintext, which might be convenient but creates huge legal liabilities for your business. The is very clear, unless it's necessary, and like the verification code. PCI DSS don't store cardholder data never store authentication data Encryption MongoDB offers enterprise clients the ability to and perform . Any user is able to use manual to encrypt individual fields, then set to enforce the encryption. Encrypted fields are stored as binary data ( ). encrypt data files automatic client-side encryption client-side encryption schema validations subtype 6 With manual encryption, the handles applying an appropriate encryption key, as well as the encryption and decryption operations. driver for your chosen language Following the guidelines, you don’t have to encrypt your database fields if you're only storing the last four digits of the card. But be aware that any time you're bundling the Primary Account Number (PAN) with other personally identifiable information, whether at rest or in motion, all that data must be encrypted. Let's try again. PCI DSS ( { } ) ( { : , : , : , : , : NumberDecimal( ), : { : , : , : , : , : true } } ) db .payments .remove db .payments .insertOne customerId "journalfanatic@e/mail.com" status "verified" gateway "stripe" type "credit" amount 1.00 card brand "Visa" panLastFour "4242" expirationMonth 1 expirationYear 2090 cvvVerified Now you've removed the authentication data and replaced it with a Boolean, truncated the card number so you're not required to encrypt other information like the expiration date. In this example, you've handled one type of payment option (credit cards), but there are many others, each with its own security requirements and data models. Research what you'll need, and the best practices for handling and storing that information. Tokenization A great way to remove some of the complexity of PCI compliance is to check if your payment processor (like ) offers a tokenization service. ) replaces all the information used in payment processing with a token string that your systems can use to track payments. Tokenization is great as it offloads PCI compliance requirements to your payment processor, reducing your liability and simplifying your code. Stripe Tokenization Here's an example of a tokenized model: ( { : , status: , gateway: , type: , token: } ) db .payments .insertOne customerId "journalfanatic@e/mail.com" "awaitingVerification" "stripe" "token" "card_1IDHBZFdjJUqVVV2gPlbz8BC" Orders Orders are where the rubber meets the road. Your customer has a profile that enables you to send them something, and you have products to send them and the ability to accept payments. The order tracks your product through the process of leaving your possession and entering the customer's hands. Orders are another data model for tracking human behavior, so they can get messy. Customers can abandon and modify orders and request refunds. This step in the example brings together all the moving parts of your model: products are purchased by customers with payments. Our model will include things that exist in other collections, like shipping addresses and discounts. In a SQL database, this information would exist as references to existing tables, but remember that in MongoDB you want all the relevant data to be returned in a single read command. It's often more efficient to make a copy of relevant information and store it in the collection. That way, when a user checks on an order, you don't have to read through multiple collections to display the information you want. orders ( { : , : , : , : , : , : NumberDecimal( ) : [ { : , : , : NumberDecimal( ), : NumberDecimal( ), : NumberDecimal( ), : NumberDecimal( ), : NumberDecimal( ), }, { : , : , : NumberDecimal( ), : NumberDecimal( ), : NumberDecimal(. ), : NumberDecimal( ) } ], : { :{ : , : , : , : , : , : }, : { : , : , : , : , : , : }, : , : } } ) db .orders .insertOne customerId "journalfanatic@e/mail.com" paymentId "600e6f37aa2232f59e273082" paymentStatus "paid" status "shippedAwaitingDelivery" currency "USD" totalCost 39.85 items sku "154B" quantity "2" price 14.99 discount 1.00 preTaxTotal 27.98 tax 1.00 total 28.98 sku "154A" quantity "1" price 9.99 preTaxTotal 9.99 tax 87 total 10.86 shipping address street1 "50 Work Street" street2 "Floor 16" city "Chicago" state "IL" country "USA" zip "60601" origin street1 "1 Penn Ave" street2 "" city "New York" state "NY" country "USA" zipCode "46281" carrier "USPS" tracking "123412341234" Modeling the Whole Here are all the parts of your NoSQL e-commerce data model so far: It's an excellent start, but as I stated at the beginning of this guide, there are many facets of an e-commerce store, and we’ve barely brushed the surface. Other valid activities to the model could include adding , handling image optimization, internationalization, additional , product categorization, customer customization, subscriptions, recurring payments, taxes, and returns. product distribution channels data validation When modeling your data for e-commerce, mistakes can be costly, and customers will rarely appreciate the amount of effort that goes into creating a finely tuned database. With that in mind, you can consider of your e-commerce database to a service. You can make use of low-overhead API requests to store your data and let someone else keep up with industry best practices while you focus on improving your product. offloading the management Previously published at https://resources.fabric.inc/blog/nosql-ecommerce-data-model