How to Design a Flexible Database Model

The products and services which we are interacting with are evolving day by day with new sets of features and experiences. Food delivery services delivering food to every doorstep are now hosting dining services. Amazon selling books initially has now expanded to an e-commerce giant dealing with various categories of products.

Handling these business changes at a data store level can be a nightmare for software engineers if the design of the underlying model does not account for adaptability.

“Good Design is Good Business.”

Thomas Watson, IBM

Let’s start with a requirement.

As a database architect ~~(role exists in the real world?)~~ I have been asked to build a product model for an e-commerce application similar to Amazon helping end-users to buy different products on the same platform.

To understand the extensibility factor of the model, consider the e-com platform is initially selling books as a product and later extends to sell clothes and other product categories as well. All products may or may not have the same set of attributes. For example, the binding type (hardcover, spiral, e.t.c.) is an attribute specific to books and color variation (red, black, e.t.c.) is specific to clothing. In the future, I might encounter a product having different attributes specific to itself.

Let’s explore the different ways of designing such a model which can be extensible enough to store different products with a perpetual combination of attributes, strengths, and weaknesses.

Moving forward, I will be using product and attributes keywords a lot while explaining the approaches. To add some more context, the following are the definitions of those keywords.

attributes: Specific elements of a product such as size and color of a T-shirt are attributes.

product: Combination of all attributes will decide the product. T-shirt is a product.

1. Single Table Inheritance with extra columns.

Creating a single table called products to store all the products with their attributes (size, color) values as columns. And having some extra reserved columns (extra_1, …) for attributes which can arise with a new product launch and were not accounted for while designing the products. So in the future, if a new product has material_type as an attribute, I can dedicate extra_1 to store the value of it.

Strengths

Data types and constraints usage - As it follows the RDBMS rules of datatypes and constraints, each product being added or updated needs to satisfy those rules.
No multiple table interactions - All the products and their details are stored on one table, so I don’t require any sort of joins on different tables to fetch them.
Index - Columns can be indexed to improve search and filter.

Weakness

Flexibility - This approach is not flexible enough as I might run out of the reserved columns some day and adding a new column after reserved columns exhaustion will require schema changes (ALTER table).
Storage - Products which don’t have certain attributes will have null values. For example, a row with a book as a product will have a null in both size and color column.
Data types of reserved columns can be misused, i.e. any data can be added to those fields as changing the datatype of reserved columns is decided way before its usage.
Reserved columns will lack descriptive names, changing the name of a reserved column will require a schema change.

2. Class Table Inheritance.

Creating a base table called products to store all the products and their common attributes (size - assuming it is common for book products too) values as its columns and having different tables, book_products and clothing_products to only store attributes binding_type and color, material_type specific to respective products. A join of the base table (products) on the inherited product table (book_products /clothing_products) will result in entire product detail (a book or a cloth).

Strengths

Data types and constraints usage - As it follows the RDBMS rules of datatypes and constraints, each product being added or updated needs to satisfy those rules.
Flexibility - Flexible to add different specific attributes to their respective product type table; changes to one table will not affect the rest of the product tables. The introduction of a new product can be done by creating a new table with attributes specific to it.
Index - Possible on sub-product level as well.
Storage - null values can be brought down as all the specific attributes are present on different tables.

Weakness

Multiple table interactions required - Fetching all the attributes of a product will require to join as some of them are present on different tables.
Multiple insert statements will be required to add a single product.
Adding a common attribute will still require an ALTER table , leading to a table lock on product.

3. Concrete Table.

Creating standalone tables for each product type (book_products and clothing_products) and having all the attributes (size, binding_type, color) required as columns of each respective table. A new table will be created for every new product type.

The strengths and weaknesses of this approach are quite similar to Class Table Inheritance with the exception of the base class table not being present.

4. Single table with JSON.

Creating a single table called products to store all the products with common attributes (size) as columns. And having a JSON column (extra) for storing all the attributes specific to that product entry.

For example: a clothes product will have size (S/M/L) stored in the size column and the rest of attributes as JSON data ({color: red, material_type: cotton}) in extra column. If the RDBMS type doesn’t support JSON datatype, then datatype such as text can be used which allows it to store large serialized objects (stringified JSON objects).

Strengths

Flexibility - This approach is flexible as different attributes can be stored in the JSON column without any ALTER on the table.
No multiple table interaction - All the products and their details are stored on one table, so I don’t require any sort of joins on different tables to fetch them.
Storage - null values can be brought down as all the specific attributes are present as JSON data, but JSON data takes a lot more space than normal rows and columns.

Weakness

Data types and constraints usage - The JSON data can have any kind of data and can lead to misuse. Adding constraints on the object level is not possible.
Index - Indexing on specific attributes may not be possible, might be possible with JSONB.
It is difficult to update individual specific attributes for a product entry because of object nature. And can lead to a dirty-read problem.
It is difficult to fetch individual specific attributes from the object if the RDBMS version doesn’t support JSON data type.

5. EAV (Entity-Attribute-Value).

Following this approach, I will separate out the entity (products) and attributes (product_attributes) as two different tables. product will not be storing any information related to attributes. Instead, each attribute will be stored as a key (attribute)-value (value) entry in product_attributes . Join of products on product_attributes will result in entire product details (a book or a cloth).

For example: To store a cloth product, I will be creating a product entry in products and each attribute such as size - S, color - red and material_type - cotton as different key-value entries (rows) in product_attributes with the entity referring to the products.id.

Strengths

Flexibility - This approach is flexible as any number of attributes can be stored in the product_attributes.
Storage - Only those attributes are being added to the product_attributes which are required by the entity (product).

Weakness

Data types and constraints usage - Since each attribute is being stored, as a row in product_attributes , RDBMS data types and constraints rules cannot be exercised and instead have to be handled at the application level.
Multiple table interaction required - Fetching a product and its attribute details will require joins of product_attributes leading to complex SQL operations, i.e. each attribute will lead to a join.
Index - Indexing on specific attributes will not be possible.

6. NoSQL.

Relational tables have a fixed set of columns and have to be decided while designing the model, whereas NoSQL databases are free of these rules and constraints. Any kind of data can be stored in the non-relational model.

Strengths

Flexibility - This approach is truly flexible as any kind of attribute can be present with each product.
No multiple table interaction - All the products and their details are stored on one table, so I don’t require any sort of joins on different tables to fetch them.
Index - Columns can be indexed to improve search and filter.
Storage - Non-relational tables are space-efficient as only required attributes will be added to each product entry.

Weakness

Data types and constraints - This is one of the major weaknesses of NoSQL databases as there are no sets of rules present on the table. These rules have to be managed at the application level.

Which approach to consider?

The approaches explained in this article are some of the generic standards of handling database model flexibility. There can be several other ways as well to design models specific to every use case. This article can be thought of outlooking different ways to build basic extensible models and can be extended with tweaks specific to business requirements.

Each approach has its own strengths and weaknesses, so the best approach depends on the use case of a design problem being solved and the result of some brainstorming meetings with your fellow mates 😄.