Database normalization is like organizing your closet - you want to make sure everything has its own designated space and there's no overlap or unnecessary duplication. It's a process that helps keep your relational database clean and tidy.
There are different levels of normalization, just like there are different sections in your closet. The most commonly used levels are First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF). Each level has its own set of rules to follow.
First normal form is like putting each article of clothing in its own drawer or hanger. Each item has its own space and there are no duplicates. For example, imagine a table called "pets" with the following columns: "pet_id", "name", "breed", and "age". The table contains the following data:
pet_id | name | breed | age
--------------------------------------------------------------
1 | Fluffy | Persian Cat | 5
2 | Sparky | Labrador Retriever | 3
3 | Whiskers| Siamese Cat | 7
In this case, the table is already in first normal form as all columns contain unique values, and there is no overlap.
Second normal form is like making sure each drawer or hanger is labeled and items are put in the correct place. Each item has its own space and belongs in the correct section. For example, consider the following table called "pets_vaccinations" with the following columns: "pet_id", "vaccine_name", "date_administered". The table contains the following data:
pet_id | vaccine_name | date_administered
--------------------------------------------------------------
1 | Rabies | 2020-01-01
2 | Distemper | 2019-12-31
3 | Feline Leukemia | 2020-01-02
In this case, the "date_administered" column is dependent on "pet_id" which is the primary key. So it is in second normal form.
Third normal form is like organizing all your items in the closet according to color, style, and function. Each item has its own space and belongs in the correct section. For example, consider the following table called "pets_grooming" with the following columns: "grooming_id", "pet_id", "grooming_service", "groomer_id", "groomer_name". The table contains the following data:
grooming_id | pet_id | grooming_service | groomer_id | groomer_name
--------------------------------------------------------------
1 | 1 | Haircut | 101 | Lisa
2 | 2 | Nail Trim | 102 | Mark
3 | 3 | Bath | 103 | Sarah
In this case, the "groomer_name" column is dependent on "groomer_id" which is not the primary key. So it is not in third normal form. To make it in third normal form we can create a new table called "groomers" with the following columns: "groomer_id", "groomer_name" and establish relationship between "pets_grooming" and "groomers" table
For example, consider a denormalized table called "pets" with the following columns: "pet_id", "name", "breed", "age", "vaccine_name", "date_administered", "grooming_service", "groomer_id", "groomer_name". The table contains the following data:
pet_id | name | breed | age | vaccine_name | date_administered | grooming_service | groomer_id | groomer_name
------------------------------------------------------------------------------------------------------------------
1 | Fluffy | Persian Cat | 5 | Rabies | 2020-01-01 | Haircut | 101 | Lisa
2 | Sparky | Labrador Retriever| 3 | Distemper | 2019-12-31 | Nail Trim | 102 | Mark
3 | Whiskers| Siamese Cat | 7 | Feline Leukemia | 2020-01-02 | Bath | 103 | Sarah
To convert this denormalized table to Third Normal Form (3NF), we would need to break it up into smaller, more specialized tables. One table for basic pet information ("pets"), one table for vaccinations ("pets_vaccinations"), and one table for grooming ("pets_grooming").
The "pets" table would contain the following columns: "pet_id", "name", "breed", "age".
pet_id | name | breed | age
--------------------------------------------------------------
1 | Fluffy | Persian Cat | 5
2 | Sparky | Labrador Retriever | 3
3 | Whiskers| Siamese Cat | 7
The "pets_vaccinations" table would contain the following columns: "pet_id", "vaccine_name", "date_administered".
pet_id | vaccine_name | date_administered
--------------------------------------------------------------
1 | Rabies | 2020-01-01
2 | Distemper | 2019-12-31
3 | Feline Leukemia | 2020-01-02
The "pets_grooming" table would contain the following columns: "pet_id", "grooming_service", "groomer_id"
pet_id | grooming_service | groomer_id
--------------------------------------------------------------
1 | Haircut | 101
2 | Nail Trim | 102
3 | Bath | 103
Finally, we can create a separate table for groomer's information ("groomers") containing the following columns: "groomer_id", "groomer_name".
groomer_id | groomer_name
--------------------------------------------------------------
101 | Lisa
102 | Mark
103 | Sarah
By breaking the data up in this way, we reduce the chances of data duplication and inconsistencies while making the data retrieval faster as we can query specific tables instead of the whole table.
Additional forms of normalization that are not as widely used include:
Lead image source.