Kimball & Inmon vs. the Retail Store

Years back I had read a blog about database scalability where it simplifies definition of scalability with activities in a kitchen. I was quite surprised how successful the comparison was. Come to think about it, technology is and should be inspired by what’s happening around us. This thinking pushed me into thinking and linking technology with my everyday life.

I have been recently working on the classics in Analytics for a particular task, therefore spending a lot of time around Kimball vs. Inmon* approaches, how data models should be shaped with Best Practices and so on. I realized how much a brick & mortar retail store can have similarities with Normalized and Denormalized data forms.

If you do not know what Normalized or Denormalized form means, you can be a better judge if retail store example is suitable. Maybe there are even more suitable examples out there. But before we jump in to the retail store example, let’s quickly define what Normalized and Denormalization stand for.

Normalized form

The idea is to structure data as much as possible to reduce data redundancy. This means object domains are separated from each other as much as possible. In case you need to retrieve more than one object domain, then you need to tie them together via joins. An example is Customers vs. Orders. Both Customers and Orders will be discrete object domains and would be persisted separately. If you need to get a list of orders for customer older than 50 years, then you would need to tie Customers and Orders domains in order to get the correct resultset.

Denormalized form

Mostly used for fast and efficient data retrieval. The analytics questions asked by business are sometimes very complex queries. You need to gather a lot of object domains together; unlike only two object domains example of Customers and Orders above. On top of this, sometimes domains’ complexity might also be huge which would result in a lot of join operations between them. A lot of join operations during runtime to get the end resultset might take too much time as the join operations might be too costly. In order to resolve this issue, one approach is to build DataMart like structures where you have the object domains already joined. Let’s take the Customers and Orders example. Instead of having the joins execute at runtime, we may create a single table that repeats the same customer information for every order.

As a key takeaway, denormalization over normalization means duplication of data and additional cost overhead on data storage side. However, getting the list of orders for customers older than 50 years would be incredibly fast.

Now that the concepts are clear in definition, let’s revert back to the original goal and see how these concepts can have similarities with brick & mortar retail stores:

The interior placement design of the brick & mortar retail store is usually done in such a way that customers can easily find the products they would like to purchase. If a customer needs to buy apples, he/ she already would know that they should visit the Groceries aisle, simply because all fruits are piled up there. If next product in line is a bottle of wine, the aisle to go to would be the Drinks aisle. You go through the wines, pick up the one you favor and place in the grocery bag. Similar operation follows for every item in the grocery list. You go to a separate aisle and pick up the isolated item.

So, how does this relate to normalized form? Think of the moment you enter the retail store until the moment you end up at the cashier queue as a single process. If you were going to pick up 30 products at the retail store, it is very unlikely to find them all together already prepared for you at a classic retail store (there are exceptions however for simplicity let’s skip them for now). You would have to go to each aisle and pick up pieces separately. This is similar to normalized object domains being separate from each other and the fact that you have to tie (join) them together if you need data from separate object domains.

In other words, with this perspective classic brick & mortar retail stores can be compared to the Normalized form. All pieces are categorized and all categories have relation(s) to other categories. For example, you are more likely to find beers next to wine section instead of groceries section.

Now let’s focus on the process after the cashier queue. Queue ends, you pay and place all pieces in a bag and are ready to go home. Come to think of it, what you needed at the end of the day was that single bag containing the 30 pieces. If the bag were to be ready when you first walked in the retail store, you might have simply chosen to pick up the bag instead of wasting time with picking up all the 30 items yourself.

So, how does this relate to denormalized form? Denormalized layers are similar to the 30 pieces within the bag. The structures are already readily available for consumption. Instead of going through the hassle of multiple joins and recreating all, you simply pick up the data from the denormalized layer.

This is the reason most of the time, we face Data Warehouse designs having a Data Model and afterwards DataMarts. As DataMarts (denormalized) are supposed to be more readily available for end users than industry standard Data Models (normalized), one would assume end users would never want to reach to Data Models. At the end of the day, why not work with already built packages compared to creating them yourself?!

However we see often at various clients that business reaches out to the Data Models hence resulting in Shadow IT.

But why?

The main reason is simply the DataMarts (already built packages – denormalized) are not designed well enough according to what business actually asks for. There are various reasons behind this like unplanned changes, project dependencies, simply poor design. This showcases how important it is to design Data Models correctly as it is the foundation rest is built upon.

Article written by Ernst Blauw and Can Yurtseven