Unit 3 | DBMS Notes | AKTU Notes


Data Base Design & Normalization

Unit 3 | DBMS Notes | AKTU Notes


Functional Dependencies (FDs)

- Definition: A functional dependency is a relationship between two sets of data (or attributes) in a table. It tells us how one piece of data (attribute) can determine another.
- Example: In a table of students, if the Roll Number uniquely determines the Name of the student, then:
  - Roll Number → Name
  - This means knowing the Roll Number will help us find the Name.
- Key Terms:
  - Determinant: The attribute on the left side (e.g., Roll Number in the example).
  - Dependent: The attribute on the right side (e.g., Name in the example).
- Why Important: Functional dependencies help us design tables in such a way that there is no unnecessary repetition of data.

Normal Forms (NF)

Normalization is a process used to organize data in a database to reduce redundancy (repeated data) and improve data integrity (accuracy and consistency). The steps of normalization are called normal forms (NF). Each step solves specific problems.

Normal Forms in Detail:

First Normal Form (1NF):

- Goal: Ensure each column has atomic (indivisible) values, and there are no duplicate rows.
- Rule:
  1. Each cell contains only a single value.
  2. No repeating groups (a column cannot have multiple values in one cell).
- Example:
  - Bad Table:  
Roll Number Subjects
1 Math, Science
    

This violates 1NF because "Subjects" contains multiple values.
  - Fixed Table:  
Roll Number Subject
1 Math
1 Science


Second Normal Form (2NF):

- Goal: Eliminate partial dependencies (where a non-key column depends on part of a composite primary key).
- Rule:
  1. The table must be in 1NF.
  2. Every non-key column must depend on the whole primary key, not just part of it.
- Example:  
  - If a table has a composite primary key (e.g., Roll Number + Subject), but a column like "Student Name" depends only on "Roll Number," this violates 2NF.
  - Fix: Split the table into two smaller tables.

Third Normal Form (3NF):

- Goal: Remove transitive dependencies (when a non-key column depends on another non-key column instead of the primary key).
- Rule:
  1. The table must be in 2NF.
  2. No non-key column should depend on another non-key column.
- Example:
  - A table with "Roll Number → Student Name" and "Student Name → Class" has a transitive dependency.
  - Fix: Split into two tables:
    1. Roll Number → Student Name
    2. Student Name → Class


Boyce-Codd Normal Form (BCNF)

- Definition: BCNF is an advanced version of the Third Normal Form (3NF). It ensures that there are no anomalies or dependencies in a database. A table is in BCNF if:
  1. It is already in 3NF.
  2. For every functional dependency (FD) in the table, the determinant (left side of the FD) is a candidate key.

- Why Important: BCNF helps avoid redundancy and inconsistencies caused by certain types of dependencies that 3NF cannot handle.

- Key Concept
  - A candidate key is a column (or a set of columns) that can uniquely identify rows in a table.

- Example:  
  Bad Table (Not in BCNF)
Teacher Name Subject Department
Mr. Smith Math Science
Ms. Brown English Arts
    
  - Here, "Teacher Name → Department" is a dependency, but "Teacher Name" is not a candidate key because the table can have multiple teachers teaching the same subject. This violates BCNF.

  Fixed Table (In BCNF)
  Split into two tables:  
  1. Teacher - Department Relation  
Teacher Name Department
Mr. Smith Science
Ms. Brown Arts

  2. Subject - Teacher Relation
Subject Teacher Name
Math Mr. Smith
English Ms. Brown

- When Is BCNF Needed: BCNF is used for complex relationships where a normal 3NF table still has overlapping data or anomalies.

Inclusion Dependence

- Definition: Inclusion dependence is a type of dependency in databases where one column (or a set of columns) in one table refers to a column (or columns) in another table. It ensures that the data in one table matches valid data in another.

- Why Important: Inclusion dependence enforces referential integrity, which means that relationships between tables are valid.

- Key Concept:  
  - It is similar to the concept of foreign keys, where one table references another to maintain a relationship.

- Example:  
  1. Consider two tables:  
     Student Table:  
Roll Number Name Class
1 John 10
2 Alice 12

     Class Table
Class Class Teacher
10 Mr. Adams
12 Ms. Clarke

  2. The column "Class" in the Student Table must match the values in the "Class" column of the Class Table. This is an inclusion dependence because the values in one table depend on the valid data in another table.

- Purpose: Inclusion dependence ensures that:
  - Students cannot be assigned to a class that doesn’t exist in the "Class Table."
  - All data remains accurate and consistent across related tables.

Key Difference:
- BCNF: Ensures that the table is free of certain dependency-related problems within itself.
- Inclusion Dependence: Ensures that tables referring to each other have consistent and valid data.


Lossless Join Decomposition

- Definition:  
  In a database, Lossless Join Decomposition ensures that when a table (relation) is broken into smaller tables (decomposed) to remove redundancy or solve anomalies, the original table can be perfectly reconstructed by joining the smaller tables without losing any data.

- Why Is It Important?
  When a table is decomposed to achieve normalization (like moving to 2NF, 3NF, or BCNF), it is crucial to ensure that no information is lost during this process. Lossless decomposition guarantees data integrity.

Key Concept:
- A decomposition is called lossless if:
  When you join the smaller tables back together, you get the exact original table, without any extra rows (duplicates) or missing rows.

Example:

1. Original Table:  
Student ID Name Course Teacher
101 John Math Mr. Smith
102 Alice Science Ms. Brown

2. Decomposition into Two Tables:  
   Table 1 (Student-Course):  
Student ID Course
101 Math
102 Science

   Table 2 (Course-Teacher):  
Course Teacher
Math Mr. Smith
Science Ms. Brown

3. Rejoining Tables:  
   If you join Table 1 and Table 2 on the "Course" column, you get:  
Student ID Name Course Teacher
101 John Math Mr. Smith
102 Alice Science Ms. Brown

   - This matches the original table exactly. Hence, the decomposition is lossless.

How to Check for Lossless Decomposition:
To ensure decomposition is lossless, we can check:
1. There should be at least one common column (attribute) between the smaller tables.
2. The common column must be a candidate key for at least one of the smaller tables.

Non-Lossless Decomposition (Counterexample):
1. Original Table:  
Student ID Name Course
101 John Math
102 Alice Science

2. Decomposed into Two Tables (Incorrect):  
   Table 1:  
Student ID Name
101 John
102 Alice

   Table 2
Name Course
John Math
Alice Science

3. Rejoining Tables:  
   Joining these tables on "Name" might result in duplicate rows or mismatched data, which is different from the original table. This makes it a lossy decomposition.

Conclusion:
- Lossless Join Decomposition is a key property in database normalization to ensure no data is lost during table splitting.
- It is crucial for maintaining data integrity and consistency when designing relational databases.


Normalization Using FD, MVD, and JD

Functional Dependencies (FDs):
- Definition: A Functional Dependency (FD) is a rule that describes the relationship between columns in a table. It shows how one column determines another.  
  - Example: In a Student table, Roll NumberName. Knowing the Roll Number helps find the Name.
  
- Role in Normalization:
  - FDs are used to eliminate redundancy and organize data into normal forms (1NF, 2NF, 3NF, BCNF).  
  - For example, in BCNF, every FD’s determinant must be a candidate key to ensure no redundancy.

Multivalued Dependencies (MVDs):
- Definition: A Multivalued Dependency (MVD) occurs when one column determines multiple independent values of another column.  
  - Example: In a table with Student, Course, and Hobby:
    - A student named "John" can take multiple courses (Math, Science) and have multiple hobbies (Cricket, Reading).  
    - This creates an MVD: Student →→ Course and Student →→ Hobby.  

- Role in Normalization:
  - MVDs are handled in the Fourth Normal Form (4NF).  
  - A table is in 4NF if it is in BCNF and has no multivalued dependencies unless they are trivial (e.g., every column depends on a key).  
  - Fixing MVDs often involves splitting the table into smaller tables.

  Example:  
  Original Table:  
Student Course Hobby
John Math Cricket
John Science Reading

  Split into two tables to remove MVD:  
  Table 1 (Student-Course):  
Student Course
John Math
John Science

  Table 2 (Student-Hobby):  
Student Hobby
John Cricket
John Reading

Join Dependencies (JDs):
- Definition: A Join Dependency (JD) specifies how a table can be split into multiple smaller tables and then rejoined to recreate the original table without losing data.  
  - Example: A table with columns Student, Course, Teacher can be split into two tables:  
    1. Student, Course 
    2. Course, Teacher

- Role in Normalization:
  - JDs are addressed in the Fifth Normal Form (5NF), also called Project-Join Normal Form (PJNF).  
  - A table is in 5NF if it is in 4NF and has no join dependencies that cause redundancy.  
  - Fixing JDs ensures that data remains consistent across complex relationships.

Alternative Approaches to Database Design

Normalization is one approach to designing a database, but there are alternative methods too:

1. Denormalization:
- Definition: This is the opposite of normalization. It combines tables to reduce the number of joins and improve performance, even if it introduces some redundancy.  
- When Used:
  - When the database needs to process queries quickly (e.g., in large-scale applications).  
- Example:
  - Instead of splitting Student and Course into separate tables, keep them in one table:  
Student Course Teacher
John Math Mr. Smith
 

2. Entity-Relationship (ER) Modeling:
- Definition: ER modeling focuses on designing the database visually using entities (like tables), their attributes (columns), and relationships (connections between tables).  
- Purpose:
  - It simplifies the design process and ensures all real-world relationships are captured.  
- Example:
  - Entities: Student, Course.  
  - Relationship: A student enrolls in courses.

3. Object-Oriented Database Design:
- Definition: This approach organizes data as objects, similar to object-oriented programming.  
- When Used:
  - In applications where data has complex relationships or behaviors (e.g., multimedia databases or CAD software).  

4. NoSQL Database Design:
- Definition: Instead of relational tables, data is stored in flexible formats like key-value pairs, documents, or graphs.  
- When Used:
  - For large-scale, unstructured, or semi-structured data (e.g., social media, e-commerce).  
- Example:
  - A product database might store each product as a document:  

    json

    { "ProductID": 101, "Name": "Laptop", "Price": 50000 }  
    

Conclusion:
- Normalization using FD, MVD, and JD is critical to ensuring the database is efficient, free from redundancy, and maintains data integrity.  
- Alternative approaches like denormalization, ER modeling, or NoSQL offer flexibility and performance benefits for specific use cases.

No comments:

Post a Comment