You are working on a project to create a for a school. This database should contain details about the student including class, subjects, and teachers. Bearing in mind that a student would learn multiple subjects and may have multiple teachers. If you are to present this, it would look like the table below: database Student Id Student Name Age Year Courses Course id 001 Benson 19 Year 2 MathsEconomicsGrammarGeography 20013 002 Mary 18 Year 2 MathHistoryGeographyLiterature 20014 003 Dennis 20 Year 3 BiologyPhysicsHistoryMath 20015 In this table we represented all the required data, however, using this format allows you to assign more than one value to a cell in a table, which causes a repetition of data and is considered an anomaly. As a database grows, anomalies can lead to data integrity problems, You would need a method to represent data correctly, thereby eliminating anomalies. What is Normalization? In simple terms, is a system that allows you to organize data in a way that prevents repetition. It breaks larger tables into smaller tables by removing duplicate data from its relational tables. normalization This allows you to: Reduce data redundancy and, in turn, improve Integrity eliminate Insertion, Update, and Deletion Anomalies To understand the concept of normalization, you would need to know what keys are and how relationships work in a database. Keys Keys refer to a column or a set of columns in a database table that uniquely identifies each row. They play a crucial role in establishing relationships between tables and ensuring data integrity. Some Commonly used keys are: Primary key Primary keys are columns that uniquely identify rows in a table. For instance, the first row in your student table. Student Id Student Name Age Year Courses Course id 001 Benson 19 Year 2 MathsEconomicsGrammarGeography 20013 002 Mary 18 Year 2 MathHistoryGeographyLiterature 20014 003 Dennis 20 Year 3 BiologyPhysicsHistoryMath 20015 The student id can be considered a primary key in this case as it is used to identify each student and is distinct through the table. are very similar to primary keys and can be used interchangeably. Candidate Keys are basically a set of primary keys in a table which are in this case Student ID and Course ID. Super Keys Foreign Keys Foreign keys serve as references to the primary keys of other tables. This establishes a relationship between the two tables within a database Forms of Normalization Normalization follows a series of stages called Normal forms. To fully understand normal forms, consider them as set rules that should be followed. Here is a list of the Normal Forms in SQL: 1NF (First Normal Form) 2NF (Second Normal Form) 3NF (Third Normal Form) BCNF (Boyce-Codd Normal Form) 4NF (Fourth Normal Form) 5NF (Fifth Normal Form) 6NF (Sixth Normal Form) First Normal Form (INF) First Normal form aims to solve the problem of atomicity (multiple values in a cell). The rules in the First Normal form state that: Each cell should contain a single value. Each record needs to be unique Now, if you apply the rules of First Normal Form to our table, you would reduce the value of the subject column to one per cell. Student Id Student Name Age Year Course Course id 001 Benson 19 Year 2 Economics 20013 002 Mary 18 Year 2 History 20014 003 Dennis 20 Year 3 Biology 20015 Second Normal Form(2NF) For a table to be in the second normal form; it has to be in First Normal Form(1NF) It shouldn't possess partial dependency. This means each column in the table should be fully dependent on the primary key. Let’s take a look at the student table; Student Id Student Name Age Year Course Course id 001 Benson 19 Year 2 Economics 20013 002 Mary 18 Year 2 History 20014 003 Dennis 20 Year 3 Biology 20015 In this table, the non-key columns(Student Name, age, year, course ) are all attributes of the student which are identified by the studentID. But, the course id only depends on the course, which is a part of the primary key(Student ID ). Hence, this table violates the second normal form(2NF) To achieve the Second Normal Form, you need to divide the table based on the dependency: Student Id Student Name Age Year Course 001 Benson 19 Year 2 Economics 002 Mary 18 Year 2 History 003 Dennis 20 Year 3 Biology Course Course id Economics 20013 History 20014 Biology 20015 Using the Second Normal Form (2NF), the Course column now entirely depends on the primary keys of Student and Course Table respectively. Third Normal Form (3NF) Now, you have created a separate table hereby eliminating partial dependencies on your table. Let’s assume the teacher of your students, just scored their recent paper and wants to store the record in your table. The table would be updated to: Student Id Student Name Age Year Course Course Score 001 Benson 19 Year 2 Economics 88 002 Mary 18 Year 2 History 92 003 Dennis 20 Year 3 Biology 85 In this table, there are no partial dependencies. However, Course, Age, and Year are directly dependent on the Student Id. Although the Course Score is also dependent on the student Id, It is first dependent on the course. Student Id → Course Student Id → Course → Course Id This means that the table has a transitive dependency and hence does not meet the third normal form requirement. To bring the table to a third normal form, it; has to be in 2NF Should have no transitive functional dependencies. To convert the 2NF table into 3NF, you would need to split the table based on transitive functional dependency. Student Id Student Name Age Year Course 001 Benson 19 Year 2 Economics 002 Mary 18 Year 2 History 003 Dennis 20 Year 3 Biology Course Course Score Economics 88 History 92 Biology 85 Boyce-Codd Normal Form(BCNF) Boyce-Codd Normal Form is the superior version of 3NF and is also known as 3.5 NF. It is used to tackle certain types of anomalies which are not resolved with 3NF. To resolve a table in Boyce-Codd normal form. The table has to Be in a Third Normal Form(3NF). Have each attribute on the right side of a dependency depend wholly primary key of the table. Let's consider an example where the given table does not satisfy BCNF. Student Id Student Name Age Course Instructor 001 Benson 19 Economics Dr. Smith 002 Mary 18 History Dr. Johnson 003 Dennis 20 Biology Dr. Lee In this case, the dependency Course → Instructor violates BCNF because Course is not a superkey. This means that the dependency does not rely on the entire Student ID. To achieve BCNF, we need to split the table into two separate tables: Student table Student Id Student Name Age 001 Benson 19 002 Mary 18 003 Dennis 20 Course table Course Course Instructor Economics Dr. Smith History Dr. Johnson Biology Dr. Lee In the modified version, we reference the primary key from the Course table as a foreign key in the Student table. Although these are the most used form of Normalization. There are other forms such as; Fourth Normal Form(4NF) In 4NF, a table should satisfy the following two conditions: It must already be in 3NF. It should not have any multi-valued dependencies. Fifth Normal Form (5NF) For a table to be in the Fifth Normal Form(5NF), It must already be in 4NF. It should not have any non-trivial join dependencies.