paint-brush
Understanding Normalization: A Guide to Effective Database Designby@hyemiie
193 reads

Understanding Normalization: A Guide to Effective Database Design

by Yemi OjedapoJune 14th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Normalization is a system that allows you to organize data in a way that prevents repeated data. It breaks larger tables into smaller tables by removing duplicate data from its relational tables. To understand the concept of normalization, you would need to know what keys are and how relationships work in a database.
featured image - Understanding Normalization: A Guide to Effective Database Design
Yemi Ojedapo HackerNoon profile picture


You are working on a project to create a database for a school. This database should contain details about the student including class, subjects, and teachers. Bearing in mind that a student would learn multiple subjects and may have multiple teachers. If you are to present this, it would look like the table below:

Student Id

Student Name

Age

Year

Courses

Course id

001

Benson

19

Year 2

MathsEconomicsGrammarGeography

20013

002

Mary

18

Year 2

MathHistoryGeographyLiterature

20014

003

Dennis

20

Year 3

BiologyPhysicsHistoryMath

20015


In this table we represented all the required data, however, using this format allows you to assign more than one value to a cell in a table, which causes a repetition of data and is considered an anomaly. As a database grows, anomalies can lead to data integrity problems, You would need a method to represent data correctly, thereby eliminating anomalies.


What is Normalization?

In simple terms, normalization is a system that allows you to organize data in a way that prevents repetition. It breaks larger tables into smaller tables by removing duplicate data from its relational tables.


This allows you to:

  • Reduce data redundancy and, in turn, improve Integrity

  • eliminate Insertion, Update, and Deletion Anomalies


To understand the concept of normalization, you would need to know what keys are and how relationships work in a database.


Keys

Keys refer to a column or a set of columns in a database table that uniquely identifies each row. They play a crucial role in establishing relationships between tables and ensuring data integrity.


Some Commonly used keys are:


Primary key

Primary keys are columns that uniquely identify rows in a table. For instance, the first row in your student table.

Student Id

Student Name

Age

Year

Courses

Course id

001

Benson

19

Year 2

MathsEconomicsGrammarGeography

20013

002

Mary

18

Year 2

MathHistoryGeographyLiterature

20014

003

Dennis

20

Year 3

BiologyPhysicsHistoryMath

20015


The student id can be considered a primary key in this case as it is used to identify each student and is distinct through the table.


Candidate Keys are very similar to primary keys and can be used interchangeably.

Super Keys are basically a set of primary keys in a table which are in this case Student ID and Course ID.


Foreign Keys

Foreign keys serve as references to the primary keys of other tables. This establishes a relationship between the two tables within a database


Forms of Normalization

Normalization follows a series of stages called Normal forms. To fully understand normal forms, consider them as set rules that should be followed.


Here is a list of the Normal Forms in SQL:


  • 1NF (First Normal Form)
  • 2NF (Second Normal Form)
  • 3NF (Third Normal Form)
  • BCNF (Boyce-Codd Normal Form)
  • 4NF (Fourth Normal Form)
  • 5NF (Fifth Normal Form)
  • 6NF (Sixth Normal Form)


First Normal Form (INF)

First Normal form aims to solve the problem of atomicity (multiple values in a cell).

The rules in the First Normal form state that:


  • Each cell should contain a single value.
  • Each record needs to be unique


Now, if you apply the rules of  First Normal Form to our table,  you would reduce the value of the subject column to one per cell.

Student Id

Student Name

Age

Year

Course

Course id

001

Benson

19

Year 2


Economics

20013

002

Mary

18

Year 2


History

20014

003

Dennis

20

Year 3

Biology

20015



Second Normal Form(2NF)

For a table to be in the second normal form;


  • it has to be in First Normal Form(1NF)
  • It shouldn't possess partial dependency. This means each column in the table should be fully dependent on the primary key.


Let’s take a look at the student table;

Student Id

Student Name

Age

Year

Course

Course id

001

Benson

19

Year 2


Economics

20013

002

Mary

18

Year 2


History

20014

003

Dennis

20

Year 3

Biology

20015


In this table, the non-key columns(Student Name, age, year, course ) are all attributes of the student which are identified by the studentID. But, the course id only depends on the course, which is a part of the primary key(Student ID ). Hence, this table violates the second normal form(2NF)


To achieve the  Second Normal Form, you need to divide the table based on the dependency:

Student Id

Student Name

Age

Year

Course

001

Benson

19

Year 2


Economics

002

Mary

18

Year 2


History

003

Dennis

20

Year 3

Biology


Course

Course id


Economics

20013


History

20014

Biology

20015


Using the Second Normal Form (2NF), the Course column now entirely depends on the primary keys of Student and Course Table respectively.


Third Normal Form (3NF)

Now, you have created a separate table hereby eliminating partial dependencies on your table. Let’s assume the teacher of your students, just scored their recent paper and wants to store the record in your table.


The table would be updated to:

Student Id

Student Name

Age

Year

Course

Course Score

001

Benson

19

Year 2


Economics

88

002

Mary

18

Year 2


History

92

003

Dennis

20

Year 3

Biology

85


In this table, there are no partial dependencies. However, Course, Age, and Year are directly dependent on the Student Id. Although the Course Score is also dependent on the student Id, It is first dependent on the course.


Student Id   →  Course

Student Id → Course → Course Id


This means that the table has a transitive dependency and hence does not meet the third normal form requirement. To bring the table to a third normal form, it;


  • has to be in 2NF

  • Should have no transitive functional dependencies.


To convert the 2NF table into 3NF, you would need to split the table based on transitive functional dependency.

Student Id

Student Name

Age

Year

Course

001

Benson

19

Year 2


Economics

002

Mary

18

Year 2


History

003

Dennis

20

Year 3

Biology

Course

Course Score


Economics

88


History

92

Biology

85



Boyce-Codd Normal Form(BCNF)

Boyce-Codd  Normal Form is the superior version of 3NF and is also known as 3.5 NF. It is used to tackle certain types of anomalies which are not resolved with 3NF.


To resolve a table in Boyce-Codd normal form. The table has to

  • Be in a Third Normal Form(3NF).

  • Have each attribute on the right side of a dependency depend wholly primary key of the table.


Let's consider an example where the given table does not satisfy BCNF.

Student Id

Student Name

Age

Course

Instructor

001

Benson

19

Economics

Dr. Smith

002

Mary

18

History

Dr. Johnson

003

Dennis

20

Biology

Dr. Lee


In this case, the dependency Course → Instructor violates BCNF because Course is not a superkey. This means that the dependency does not rely on the entire Student ID.

To achieve BCNF, we need to split the table into two separate tables:


Student table

Student Id

Student Name

Age

001

Benson

19

002

Mary

18

003

Dennis

20





Course table

Course

Course Instructor

Economics

Dr. Smith

History

Dr. Johnson

Biology

Dr. Lee



In the modified version, we reference the primary key from the Course table as a foreign key in the Student table.


Although these are the most used form of Normalization.


There are other forms such as;


Fourth Normal Form(4NF)

In 4NF, a table should satisfy the following two conditions:

  • It must already be in 3NF.

  • It should not have any multi-valued dependencies.


Fifth Normal Form (5NF)

For a table to be in the Fifth Normal Form(5NF),

  • It must already be in 4NF.
  • It should not have any non-trivial join dependencies.