You are working on a project to create a database for a school. This database should contain details about the student including class, subjects, and teachers. Bearing in mind that a student would learn multiple subjects and may have multiple teachers. If you are to present this, it would look like the table below:
Student Id |
Student Name |
Age |
Year |
Courses |
Course id |
---|---|---|---|---|---|
001 |
Benson |
19 |
Year 2 |
MathsEconomicsGrammarGeography |
20013 |
002 |
Mary |
18 |
Year 2 |
MathHistoryGeographyLiterature |
20014 |
003 |
Dennis |
20 |
Year 3 |
BiologyPhysicsHistoryMath |
20015 |
In this table we represented all the required data, however, using this format allows you to assign more than one value to a cell in a table, which causes a repetition of data and is considered an anomaly. As a database grows, anomalies can lead to data integrity problems, You would need a method to represent data correctly, thereby eliminating anomalies.
In simple terms, normalization is a system that allows you to organize data in a way that prevents repetition. It breaks larger tables into smaller tables by removing duplicate data from its relational tables.
This allows you to:
Reduce data redundancy and, in turn, improve Integrity
eliminate Insertion, Update, and Deletion Anomalies
To understand the concept of normalization, you would need to know what keys are and how relationships work in a database.
Keys refer to a column or a set of columns in a database table that uniquely identifies each row. They play a crucial role in establishing relationships between tables and ensuring data integrity.
Some Commonly used keys are:
Primary key
Primary keys are columns that uniquely identify rows in a table. For instance, the first row in your student table.
Student Id |
Student Name |
Age |
Year |
Courses |
Course id |
---|---|---|---|---|---|
001 |
Benson |
19 |
Year 2 |
MathsEconomicsGrammarGeography |
20013 |
002 |
Mary |
18 |
Year 2 |
MathHistoryGeographyLiterature |
20014 |
003 |
Dennis |
20 |
Year 3 |
BiologyPhysicsHistoryMath |
20015 |
The student id can be considered a primary key in this case as it is used to identify each student and is distinct through the table.
Candidate Keys are very similar to primary keys and can be used interchangeably.
Super Keys are basically a set of primary keys in a table which are in this case Student ID and Course ID.
Foreign Keys
Foreign keys serve as references to the primary keys of other tables. This establishes a relationship between the two tables within a database
Normalization follows a series of stages called Normal forms. To fully understand normal forms, consider them as set rules that should be followed.
Here is a list of the Normal Forms in SQL:
First Normal Form (INF)
First Normal form aims to solve the problem of atomicity (multiple values in a cell).
The rules in the First Normal form state that:
Now, if you apply the rules of First Normal Form to our table, you would reduce the value of the subject column to one per cell.
Student Id |
Student Name |
Age |
Year |
Course |
Course id |
---|---|---|---|---|---|
001 |
Benson |
19 |
Year 2 |
|
20013 |
002 |
Mary |
18 |
Year 2 |
|
20014 |
003 |
Dennis |
20 |
Year 3 |
Biology |
20015 |
Second Normal Form(2NF)
For a table to be in the second normal form;
Let’s take a look at the student table;
Student Id |
Student Name |
Age |
Year |
Course |
Course id |
---|---|---|---|---|---|
001 |
Benson |
19 |
Year 2 |
|
20013 |
002 |
Mary |
18 |
Year 2 |
|
20014 |
003 |
Dennis |
20 |
Year 3 |
Biology |
20015 |
In this table, the non-key columns(Student Name, age, year, course ) are all attributes of the student which are identified by the studentID. But, the course id only depends on the course, which is a part of the primary key(Student ID ). Hence, this table violates the second normal form(2NF)
To achieve the Second Normal Form, you need to divide the table based on the dependency:
Student Id |
Student Name |
Age |
Year |
Course |
---|---|---|---|---|
001 |
Benson |
19 |
Year 2 |
|
002 |
Mary |
18 |
Year 2 |
|
003 |
Dennis |
20 |
Year 3 |
Biology |
Course |
Course id |
---|---|
|
20013 |
|
20014 |
Biology |
20015 |
Using the Second Normal Form (2NF), the Course column now entirely depends on the primary keys of Student and Course Table respectively.
Third Normal Form (3NF)
Now, you have created a separate table hereby eliminating partial dependencies on your table. Let’s assume the teacher of your students, just scored their recent paper and wants to store the record in your table.
The table would be updated to:
Student Id |
Student Name |
Age |
Year |
Course |
Course Score |
---|---|---|---|---|---|
001 |
Benson |
19 |
Year 2 |
|
88 |
002 |
Mary |
18 |
Year 2 |
|
92 |
003 |
Dennis |
20 |
Year 3 |
Biology |
85 |
In this table, there are no partial dependencies. However, Course, Age, and Year are directly dependent on the Student Id. Although the Course Score is also dependent on the student Id, It is first dependent on the course.
Student Id → Course
Student Id → Course → Course Id
This means that the table has a transitive dependency and hence does not meet the third normal form requirement. To bring the table to a third normal form, it;
has to be in 2NF
Should have no transitive functional dependencies.
To convert the 2NF table into 3NF, you would need to split the table based on transitive functional dependency.
Student Id |
Student Name |
Age |
Year |
Course |
---|---|---|---|---|
001 |
Benson |
19 |
Year 2 |
|
002 |
Mary |
18 |
Year 2 |
|
003 |
Dennis |
20 |
Year 3 |
Biology |
Course |
Course Score |
---|---|
|
88 |
|
92 |
Biology |
85 |
Boyce-Codd Normal Form(BCNF)
Boyce-Codd Normal Form is the superior version of 3NF and is also known as 3.5 NF. It is used to tackle certain types of anomalies which are not resolved with 3NF.
To resolve a table in Boyce-Codd normal form. The table has to
Be in a Third Normal Form(3NF).
Have each attribute on the right side of a dependency depend wholly primary key of the table.
Let's consider an example where the given table does not satisfy BCNF.
Student Id |
Student Name |
Age |
Course |
Instructor |
---|---|---|---|---|
001 |
Benson |
19 |
Economics |
Dr. Smith |
002 |
Mary |
18 |
History |
Dr. Johnson |
003 |
Dennis |
20 |
Biology |
Dr. Lee |
In this case, the dependency Course → Instructor violates BCNF because Course is not a superkey. This means that the dependency does not rely on the entire Student ID.
To achieve BCNF, we need to split the table into two separate tables:
Student table
Student Id |
Student Name |
Age |
---|---|---|
001 |
Benson |
19 |
002 |
Mary |
18 |
003 |
Dennis |
20 |
|
|
|
Course table
Course |
Course Instructor |
---|---|
Economics |
Dr. Smith |
History |
Dr. Johnson |
Biology |
Dr. Lee |
In the modified version, we reference the primary key from the Course table as a foreign key in the Student table.
Although these are the most used form of Normalization.
There are other forms such as;
Fourth Normal Form(4NF)
In 4NF, a table should satisfy the following two conditions:
It must already be in 3NF.
It should not have any multi-valued dependencies.
Fifth Normal Form (5NF)
For a table to be in the Fifth Normal Form(5NF),