Data Base Design & Normalization
Functional Dependencies (FDs)
- Definition: A functional dependency is a relationship between two sets of data (or attributes) in a table. It tells us how one piece of data (attribute) can determine another.
- Example: In a table of students, if the Roll Number uniquely determines the Name of the student, then:
- Roll Number → Name
- This means knowing the Roll Number will help us find the Name.
- Key Terms:
- Determinant: The attribute on the left side (e.g., Roll Number in the example).
- Dependent: The attribute on the right side (e.g., Name in the example).
- Why Important: Functional dependencies help us design tables in such a way that there is no unnecessary repetition of data.
Normal Forms (NF)
Normalization is a process used to organize data in a database to reduce redundancy (repeated data) and improve data integrity (accuracy and consistency). The steps of normalization are called normal forms (NF). Each step solves specific problems.
Normal Forms in Detail:
First Normal Form (1NF):
- Goal: Ensure each column has atomic (indivisible) values, and there are no duplicate rows.
- Rule:
1. Each cell contains only a single value.
2. No repeating groups (a column cannot have multiple values in one cell).
- Example:
- Bad Table:
Roll Number | Subjects |
---|---|
1 | Math, Science |
This violates 1NF because "Subjects" contains multiple values.
- Fixed Table:
Roll Number | Subject |
---|---|
1 | Math |
1 | Science |
Second Normal Form (2NF):
- Goal: Eliminate partial dependencies (where a non-key column depends on part of a composite primary key).
- Rule:
1. The table must be in 1NF.
2. Every non-key column must depend on the whole primary key, not just part of it.
- Example:
- If a table has a composite primary key (e.g., Roll Number + Subject), but a column like "Student Name" depends only on "Roll Number," this violates 2NF.
- Fix: Split the table into two smaller tables.
Third Normal Form (3NF):
- Goal: Remove transitive dependencies (when a non-key column depends on another non-key column instead of the primary key).
- Rule:
1. The table must be in 2NF.
2. No non-key column should depend on another non-key column.
- Example:
- A table with "Roll Number → Student Name" and "Student Name → Class" has a transitive dependency.
- Fix: Split into two tables:
1. Roll Number → Student Name
2. Student Name → Class
Boyce-Codd Normal Form (BCNF)
- Definition: BCNF is an advanced version of the Third Normal Form (3NF). It ensures that there are no anomalies or dependencies in a database. A table is in BCNF if:
1. It is already in 3NF.
2. For every functional dependency (FD) in the table, the determinant (left side of the FD) is a candidate key.
- Why Important: BCNF helps avoid redundancy and inconsistencies caused by certain types of dependencies that 3NF cannot handle.
- Key Concept:
- A candidate key is a column (or a set of columns) that can uniquely identify rows in a table.
- Example:
Bad Table (Not in BCNF):
Teacher Name | Subject | Department |
---|---|---|
Mr. Smith | Math | Science |
Ms. Brown | English | Arts |
- Here, "Teacher Name → Department" is a dependency, but "Teacher Name" is not a candidate key because the table can have multiple teachers teaching the same subject. This violates BCNF.
Fixed Table (In BCNF):
Split into two tables:
1. Teacher - Department Relation
Teacher Name | Department |
---|---|
Mr. Smith | Science |
Ms. Brown | Arts |
2. Subject - Teacher Relation
Subject | Teacher Name |
---|---|
Math | Mr. Smith |
English | Ms. Brown |
- When Is BCNF Needed: BCNF is used for complex relationships where a normal 3NF table still has overlapping data or anomalies.
Inclusion Dependence
- Definition: Inclusion dependence is a type of dependency in databases where one column (or a set of columns) in one table refers to a column (or columns) in another table. It ensures that the data in one table matches valid data in another.
- Why Important: Inclusion dependence enforces referential integrity, which means that relationships between tables are valid.
- Key Concept:
- It is similar to the concept of foreign keys, where one table references another to maintain a relationship.
- Example:
1. Consider two tables:
Student Table:
Roll Number | Name | Class |
---|---|---|
1 | John | 10 |
2 | Alice | 12 |
Class Table:
Class | Class Teacher |
---|---|
10 | Mr. Adams |
12 | Ms. Clarke |
2. The column "Class" in the Student Table must match the values in the "Class" column of the Class Table. This is an inclusion dependence because the values in one table depend on the valid data in another table.
- Purpose: Inclusion dependence ensures that:
- Students cannot be assigned to a class that doesn’t exist in the "Class Table."
- All data remains accurate and consistent across related tables.
Key Difference:
- BCNF: Ensures that the table is free of certain dependency-related problems within itself.
- Inclusion Dependence: Ensures that tables referring to each other have consistent and valid data.
Lossless Join Decomposition
- Definition:
In a database, Lossless Join Decomposition ensures that when a table (relation) is broken into smaller tables (decomposed) to remove redundancy or solve anomalies, the original table can be perfectly reconstructed by joining the smaller tables without losing any data.
- Why Is It Important?
When a table is decomposed to achieve normalization (like moving to 2NF, 3NF, or BCNF), it is crucial to ensure that no information is lost during this process. Lossless decomposition guarantees data integrity.
Key Concept:
- A decomposition is called lossless if:
When you join the smaller tables back together, you get the exact original table, without any extra rows (duplicates) or missing rows.
Example:
1. Original Table:
Student ID | Name | Course | Teacher |
---|---|---|---|
101 | John | Math | Mr. Smith |
102 | Alice | Science | Ms. Brown |
2. Decomposition into Two Tables:
Table 1 (Student-Course):
Student ID | Course |
---|---|
101 | Math |
102 | Science |
Table 2 (Course-Teacher):
Course | Teacher |
---|---|
Math | Mr. Smith |
Science | Ms. Brown |
3. Rejoining Tables:
If you join Table 1 and Table 2 on the "Course" column, you get:
Student ID | Name | Course | Teacher |
---|---|---|---|
101 | John | Math | Mr. Smith |
102 | Alice | Science | Ms. Brown |
- This matches the original table exactly. Hence, the decomposition is lossless.
How to Check for Lossless Decomposition:
To ensure decomposition is lossless, we can check:
1. There should be at least one common column (attribute) between the smaller tables.
2. The common column must be a candidate key for at least one of the smaller tables.
Non-Lossless Decomposition (Counterexample):
1. Original Table:
Student ID | Name | Course |
---|---|---|
101 | John | Math |
102 | Alice | Science |
2. Decomposed into Two Tables (Incorrect):
Table 1:
Student ID | Name |
---|---|
101 | John |
102 | Alice |
Table 2:
Name | Course |
---|---|
John | Math |
Alice | Science |
3. Rejoining Tables:
Joining these tables on "Name" might result in duplicate rows or mismatched data, which is different from the original table. This makes it a lossy decomposition.
Conclusion:
- Lossless Join Decomposition is a key property in database normalization to ensure no data is lost during table splitting.
- It is crucial for maintaining data integrity and consistency when designing relational databases.
Normalization Using FD, MVD, and JD
Functional Dependencies (FDs):
- Definition: A Functional Dependency (FD) is a rule that describes the relationship between columns in a table. It shows how one column determines another.
- Example: In a Student table, Roll Number → Name. Knowing the Roll Number helps find the Name.
- Role in Normalization:
- FDs are used to eliminate redundancy and organize data into normal forms (1NF, 2NF, 3NF, BCNF).
- For example, in BCNF, every FD’s determinant must be a candidate key to ensure no redundancy.
Multivalued Dependencies (MVDs):
- Definition: A Multivalued Dependency (MVD) occurs when one column determines multiple independent values of another column.
- Example: In a table with Student, Course, and Hobby:
- A student named "John" can take multiple courses (Math, Science) and have multiple hobbies (Cricket, Reading).
- This creates an MVD: Student →→ Course and Student →→ Hobby.
- Role in Normalization:
- MVDs are handled in the Fourth Normal Form (4NF).
- A table is in 4NF if it is in BCNF and has no multivalued dependencies unless they are trivial (e.g., every column depends on a key).
- Fixing MVDs often involves splitting the table into smaller tables.
Example:
Original Table:
Student | Course | Hobby |
---|---|---|
John | Math | Cricket |
John | Science | Reading |
Split into two tables to remove MVD:
Table 1 (Student-Course):
Student | Course |
---|---|
John | Math |
John | Science |
Table 2 (Student-Hobby):
Student | Hobby |
---|---|
John | Cricket |
John | Reading |
Join Dependencies (JDs):
- Definition: A Join Dependency (JD) specifies how a table can be split into multiple smaller tables and then rejoined to recreate the original table without losing data.
- Example: A table with columns Student, Course, Teacher can be split into two tables:
1. Student, Course
2. Course, Teacher
- Role in Normalization:
- JDs are addressed in the Fifth Normal Form (5NF), also called Project-Join Normal Form (PJNF).
- A table is in 5NF if it is in 4NF and has no join dependencies that cause redundancy.
- Fixing JDs ensures that data remains consistent across complex relationships.
Alternative Approaches to Database Design
Normalization is one approach to designing a database, but there are alternative methods too:
1. Denormalization:
- Definition: This is the opposite of normalization. It combines tables to reduce the number of joins and improve performance, even if it introduces some redundancy.
- When Used:
- When the database needs to process queries quickly (e.g., in large-scale applications).
- Example:
- Instead of splitting Student and Course into separate tables, keep them in one table:
Student | Course | Teacher |
---|---|---|
John | Math | Mr. Smith |
2. Entity-Relationship (ER) Modeling:
- Definition: ER modeling focuses on designing the database visually using entities (like tables), their attributes (columns), and relationships (connections between tables).
- Purpose:
- It simplifies the design process and ensures all real-world relationships are captured.
- Example:
- Entities: Student, Course.
- Relationship: A student enrolls in courses.
3. Object-Oriented Database Design:
- Definition: This approach organizes data as objects, similar to object-oriented programming.
- When Used:
- In applications where data has complex relationships or behaviors (e.g., multimedia databases or CAD software).
4. NoSQL Database Design:
- Definition: Instead of relational tables, data is stored in flexible formats like key-value pairs, documents, or graphs.
- When Used:
- For large-scale, unstructured, or semi-structured data (e.g., social media, e-commerce).
- Example:
- A product database might store each product as a document:
json
{ "ProductID": 101, "Name": "Laptop", "Price": 50000 }
Conclusion:
- Normalization using FD, MVD, and JD is critical to ensuring the database is efficient, free from redundancy, and maintains data integrity.
- Alternative approaches like denormalization, ER modeling, or NoSQL offer flexibility and performance benefits for specific use cases.
No comments:
Post a Comment