Photo by Pawel Czerwinski on Unsplash
Dimensional modeling is a database design philosophy that aims to simplify data analysis and improve query performance. While there are different types of dimensional modeling, I will focus on the widely used style known as the star schema. This approach differs from the traditional relational database model by emphasizing ease of understanding and efficient querying.
The key to dimensional modeling lies in its design philosophy, which prioritizes the business process being modeled and the ability to extract insights from the data. In a relational database implementation, this design is referred to as a star schema due to the way tables are interconnected.
So, what distinguishes dimensional modeling from a standard relational database? While there are various best practices, the two main components that remain consistent across all approaches are facts and dimensions. Let's explore these table types individually.
Dimensional tables represent the points in the star schema. They are typically wide and short, containing details that don't change frequently and are stored for long periods. These tables represent core aspects of the business process and are used for analysis and slicing and dicing the fact table data. Each row in a dimensional table is uniquely identified by a primary key, which also serves as the link to the fact table.
The heart of the star schema is the fact table, where most or all of the dimensional tables connect. This table stores aggregable data, meaning it can be added, multiplied, averaged, etc. It resembles a transaction table and can contain data at different levels of granularity, such as seconds, minutes, hours, or days. The fact table connects to various dimensions based on the business process. In contrast to dimensional tables, fact tables are narrow and long, often containing millions of records.
Both dimensional and fact tables are typically created by combining different tables from a business's operational database. Depending on the desired level of detail in the fact table, there may be multiple fact tables and multiple dimensions (usually the case).
You might be wondering why we need another database approach when the existing one works well. To answer that, let's consider the analogy of using airplanes for travel instead of cars. Both can get us to our destination, but airplanes do it faster. Similarly, dimensional modeling simplifies data to facilitate easy access and analysis, unlike the duplication of records across multiple tables found in traditional models. By avoiding such complexities, businesses can quickly retrieve the required information for calculating KPIs or analyzing specific sections during meetings.
Albert Einstein once said, "Make everything as simple as possible, but not simpler." This principle applies to the importance of dimensional modeling. It simplifies data in a way that enables businesses to find the information they need promptly, ensuring efficient decision-making. This is particularly crucial for high-volume entities aiming to stay competitive in the fast-paced world of data.
Dimensional modeling has evolved through different approaches. Two prominent methodologies emerged in the 1990s: Kimball's Bus Matrix Methodology and Inmon's Hub and Spoke Methodology.
Kimball's method relies heavily on the concept of conformed dimensions. These dimensions are created to be unique, non-duplicated, and usable by any fact table from different business processes. However, this approach requires extensive upfront planning to identify dimensions that cut across multiple units.
Inmon's method also utilizes dimensions but is less strict than Kimball's approach. The tables do not need to be completely normalized, and the final tables connected to the frontend dashboards and reports layer are not always organized by dimensions. Instead, they may be organized by departments, potentially resulting in data duplication.
This introduction merely scratches the surface of dimensional modeling. There are numerous nuances and considerations to be made when selecting an appropriate methodology, understanding the reasons behind its adoption, and implementing it effectively.
For further reading, I would recommend The Data Warehouse Toolkit, 3rd Edition, by the Ralph Kimball Team.