Have you ever wonder what happens inside of an Amazon warehouse? As a software engineer, the supply chain field has plenty of technical challenges like running one of the largest catalogs in the world, associated with one of the biggest inventories. Amazon owns hundreds of Fulfillment Centers (FC) that contain up to a billion units. To give you a better idea, an FC is larger than 15 soccer fields. Catalog, Inventory, and Offers Let’s define a couple of terms first. - The lists all the available on a retail website. A product describes a physical item (cover, summary, author for a book). catalog products - The lists what is physically in the warehouse, it contains physical usually described by their weight, size inventory items … - An is an option to buy a given product at a point in time for a given price. offer For instance: If you sell a , this is the reference of your product that will be listed in the . If you have 1000 of them in a warehouse, then each physical item is listed in the . Kindle Oasis 8Go color Graphite catalog inventory It’s important to understand that a product reference can be in the catalog but not in the inventory (out of stock, lost). An item can be in the inventory and the reference not in the catalog (product recalled for safety reasons, consumables expired). The and the are two different products. Kindle Oasis 16Go color Graphite Kindle Oasis 8Go color Gold It’s also essential to differentiate the price from the inventory. For instance, a is sold 249.99$ on amazon.com, the same product can be sold 199.99$ on black Friday, or 249.99£ on amazon.co.uk, or it’s sold 259.99$ with a bundle of 3 books. Each price is a different offer for the same product. Offer depends on the merchant, the date, the user. Kindle Oasis 16Go color Graphite Physical vs Digital So far I only mentioned physical items. Amazon also sells digital items. If you want to read you could buy the physical book from the publisher Bloomsbury Children’s Books, or the special Ravenclaw hardcover edition from Educa Books. Harry Potter and the Philosopher’s Stone, But you can also buy the digital version. Obviously, digital items are handled differently so I will ignore this complexity in the rest of the article and focus on physical items. Naive Inventory Model The first try to model an inventory usually looks like this: Let’s assign an id to every product in our catalog. We will call it the product_id. Let’s use this key to map a row in the inventory that contains the quantity in stock. This model has 2 massive flaws: First, . You could argue that most of the time it’s true. When you buy a Kindle from a shelf at John Lewis, you just pick the first one. But in a warehouse, anything can happen. An item can be damaged by a conveyor belt. You need to track them individually. Moreover, consumables have an expiration date. So a bottle of olive oil that expired on the 1st of October 2020, is different from one expiring on the 1st of October 2021. it assumes items are interchangeable Second, i . Like in a library where all the harry potter books are on the same shelf. Unfortunately, that’s a false assumption. Items are stored where there is space left in the warehouse. So you need to know where they are so you can go find them. t assumes that all the items are stored in the same place in a warehouse : An effective storage method is random, it voluntarily avoids putting all the same items in the same place. Fire and water leak are localized incidents, if all the Harry Potter books were at the same place it would destroy your entire stock. It’s better if they destroy one item of each. Randomness spreads them across a warehouse to improve resilience. 🤯 Fun fact Improved Inventory Data Model To avoid the 2 previous flaws, here is an improved data model. Stock Keeping Unit (SKU) The Stock Keeping Unit (SKU) “is a distinct type of product for sale and all attributes associated with the product type that distinguish it from other types. It could include manufacturer, description, material, size, color, packaging, and warranty terms.” This is basically what I called before the product_id. From now I will use SKU as it’s a known convention in the supply chain industry. Parent SKU As you can see, this model introduces the parent_sku. It allows you to create a tree of products and group them by category. You can implement a fine-grained catalog search, but it can be convenient for the storage as usually SKU with the same parent usually share properties. For instance, if you want to recall an entire product line you can find all of them using the parent SKU. This is a valid point, with billions of items, key-value storage could feel more scalable. We could argue that the size of a warehouse is fixed, and so the inventory has a hard max limit of items it can contain. But the real reason is that key-value storage was not a thing in 1996. The first Amazon warehouses were running on Oracle databases. 🤯 Why SQL? Scale your Physical Storage If an ideal warehouse looks like shelves and bins, this storage system is not flexible enough. It requires pickers to unpack every item to put them in a specific bin. So let’s introduced the notion of pallets that allows fast storage. : container on a shelf to store small items (books, DVD). BINS are stored directly on the floor, and contain multiple cases. Each of them can contain many items, usually of the same product. PALLETS: To update the model to fit this new requirement, we introduced a table called CONTAINER. Same as here, we use a tree structure to map the warehouse, as a BIN can be on a shelf, which is part of a ROW. The length can tell you how many items are in the BIN. The same applies for a pallet, as an item can be in a case, that is stored in a pallet located in a given ROW. We solved the two previous problems, now each item is uniquely identified in the warehouse at a specific location. Moving an item requires changing the container_id, moving a pallet is simple too just change the parent_id. But this model creates a lot of different problems. What if we want to locate a given item so it can be delivered? Do we need to go through all the items then go upward on the tree? How do you deal with concurrency? Someone could be moving boxes from a pallet at the same time someone is picking one item from a box in the same pallet. Merge the Trees That idea is to merge the two previous tree structures. Based on the Pivot that would be the physical item itself. This is a powerful idea to have 2 parents for physical items. By using the layout of 2 trees, one top-down and the other one bottom-up some complex operations become easy. If we go back to the previous example: locating an item is just going down the graph from the SKU. If we want to list all the products in a given container we move up the graph. Same if you want to recall all the expired items in the warehouse. You would just need to do a search in this tree. You also solved your concurrency issue, as moving a pallet is now an ATOMIC operation in your SQL DB. The tree is modeled in a single table that would look like that. 🤯 Amazon didn’t invent inventory, there is a Standard in the industry call ISA-95. It’s actually part of a wider standard used for manufacturing. They define MATERIAL_LOT (correspond to the physical item) that has 2 parents: one MATERIAL_DEFINITION that describes the item, and a LOCATION that tells you where it is (corresponds to CONTAINER). Fun Fact: Each of those 3 concepts has a recursive tree that allows multiple levels of precision as we’ve seen before. Final Words Congrats if you managed to make it so far. Unfortunately, this is just the tip of the iceberg. A warehouse is a complex environment and I ignored many of the difficulties a software engineer will be facing. First, I talked about containers like BINS or PALLETS. But I didn’t mention their physical location inside the warehouse. If you want to move a pallet, you first need to know where it is and where it should go. How do you find an empty space for a given pallet? Second, I assumed every item fits in a BIN, and every PALLETS has the same size. That’s unfortunately incorrect. To optimize space, you may want to put small items together. Then I ignored the role of pickers. You know the person who walks into a warehouse and gets your item from the BIN. You want to optimize the time for them to get items. A warehouse is not a computer. There are a lot of things that can go wrong. Items can be damaged, lost. You can get deliver the wrong order, someone can make a mistake in the packaging. Your code needs to be resilient in many ways and we will talk about all these problems in a future article. Also published at https://medium.com/@raphael.moutard/track-billions-of-items-in-a-warehouse-d75bee548137

The Graph

Chain

What happens inside an Amazon warehouse: A software engineer's guide

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

5 Tips to Talk Intelligently About Security

The Noonification: How Amazon Treats Warehouse Workers Who Contracted COVID (11/30/2022)

10 Free Ways to Promote Your Amazon Products

10 Failed Startup Product Examples by Google, Microsoft and Amazon

10 Best Infographics Of 2018

The Noonification: The Destroyer (12/29/2022)

5 Tips to Talk Intelligently About Security

The Noonification: How Amazon Treats Warehouse Workers Who Contracted COVID (11/30/2022)

10 Free Ways to Promote Your Amazon Products

10 Failed Startup Product Examples by Google, Microsoft and Amazon

10 Best Infographics Of 2018

The Noonification: The Destroyer (12/29/2022)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps