Data Base Design & Normalization

Functional Dependencies (FDs)

- Definition: A functional dependency is a relationship between two sets of data (or attributes) in a table. It tells us how one piece of data (attribute) can determine another.

- Example: In a table of students, if the Roll Number uniquely determines the Name of the student, then:

- Roll Number → Name

- This means knowing the Roll Number will help us find the Name.

- Key Terms:

- Determinant: The attribute on the left side (e.g., Roll Number in the example).

- Dependent: The attribute on the right side (e.g., Name in the example).

- Why Important: Functional dependencies help us design tables in such a way that there is no unnecessary repetition of data.

Normal Forms (NF)

Normalization is a process used to organize data in a database to reduce redundancy (repeated data) and improve data integrity (accuracy and consistency). The steps of normalization are called normal forms (NF). Each step solves specific problems.

Normal Forms in Detail:

First Normal Form (1NF):

- Goal: Ensure each column has atomic (indivisible) values, and there are no duplicate rows.

- Rule:

1. Each cell contains only a single value.

2. No repeating groups (a column cannot have multiple values in one cell).

- Example:

- Bad Table:

Roll Number	Subjects
1	Math, Science

This violates 1NF because "Subjects" contains multiple values.

- Fixed Table:

Roll Number	Subject
1	Math
1	Science

Second Normal Form (2NF):

- Goal: Eliminate partial dependencies (where a non-key column depends on part of a composite primary key).

- Rule:

1. The table must be in 1NF.

2. Every non-key column must depend on the whole primary key, not just part of it.

- Example:

- If a table has a composite primary key (e.g., Roll Number + Subject), but a column like "Student Name" depends only on "Roll Number," this violates 2NF.

- Fix: Split the table into two smaller tables.

Third Normal Form (3NF):

- Goal: Remove transitive dependencies (when a non-key column depends on another non-key column instead of the primary key).

- Rule:

1. The table must be in 2NF.

2. No non-key column should depend on another non-key column.

- Example:

- A table with "Roll Number → Student Name" and "Student Name → Class" has a transitive dependency.

- Fix: Split into two tables:

1. Roll Number → Student Name

2. Student Name → Class

Boyce-Codd Normal Form (BCNF)

- Definition: BCNF is an advanced version of the Third Normal Form (3NF). It ensures that there are no anomalies or dependencies in a database. A table is in BCNF if:

1. It is already in 3NF.

2. For every functional dependency (FD) in the table, the determinant (left side of the FD) is a candidate key.

- Why Important: BCNF helps avoid redundancy and inconsistencies caused by certain types of dependencies that 3NF cannot handle.

- Key Concept:

- A candidate key is a column (or a set of columns) that can uniquely identify rows in a table.

- Example:

Bad Table (Not in BCNF):

Teacher Name	Subject	Department
Mr. Smith	Math	Science
Ms. Brown	English	Arts

- Here, "Teacher Name → Department" is a dependency, but "Teacher Name" is not a candidate key because the table can have multiple teachers teaching the same subject. This violates BCNF.

Fixed Table (In BCNF):

Split into two tables:

1. Teacher - Department Relation

Teacher Name	Department
Mr. Smith	Science
Ms. Brown	Arts

2. Subject - Teacher Relation

Subject	Teacher Name
Math	Mr. Smith
English	Ms. Brown

- When Is BCNF Needed: BCNF is used for complex relationships where a normal 3NF table still has overlapping data or anomalies.

Inclusion Dependence

- Definition: Inclusion dependence is a type of dependency in databases where one column (or a set of columns) in one table refers to a column (or columns) in another table. It ensures that the data in one table matches valid data in another.

- Why Important: Inclusion dependence enforces referential integrity, which means that relationships between tables are valid.

- Key Concept:

- It is similar to the concept of foreign keys, where one table references another to maintain a relationship.

- Example:

1. Consider two tables:

Student Table:

Roll Number	Name	Class
1	John	10
2	Alice	12

Class Table:

Class	Class Teacher
10	Mr. Adams
12	Ms. Clarke

2. The column "Class" in the Student Table must match the values in the "Class" column of the Class Table. This is an inclusion dependence because the values in one table depend on the valid data in another table.

- Purpose: Inclusion dependence ensures that:

- Students cannot be assigned to a class that doesn’t exist in the "Class Table."

- All data remains accurate and consistent across related tables.

Key Difference:

- BCNF: Ensures that the table is free of certain dependency-related problems within itself.

- Inclusion Dependence: Ensures that tables referring to each other have consistent and valid data.

Lossless Join Decomposition

- Definition:

In a database, Lossless Join Decomposition ensures that when a table (relation) is broken into smaller tables (decomposed) to remove redundancy or solve anomalies, the original table can be perfectly reconstructed by joining the smaller tables without losing any data.

- Why Is It Important?

When a table is decomposed to achieve normalization (like moving to 2NF, 3NF, or BCNF), it is crucial to ensure that no information is lost during this process. Lossless decomposition guarantees data integrity.

Key Concept:

- A decomposition is called lossless if:

When you join the smaller tables back together, you get the exact original table, without any extra rows (duplicates) or missing rows.

Example:

1. Original Table:

Student ID	Name	Course	Teacher
101	John	Math	Mr. Smith
102	Alice	Science	Ms. Brown

2. Decomposition into Two Tables:

Table 1 (Student-Course):

Student ID	Course
101	Math
102	Science

Table 2 (Course-Teacher):

Course	Teacher
Math	Mr. Smith
Science	Ms. Brown

3. Rejoining Tables:

If you join Table 1 and Table 2 on the "Course" column, you get:

Student ID	Name	Course	Teacher
101	John	Math	Mr. Smith
102	Alice	Science	Ms. Brown

- This matches the original table exactly. Hence, the decomposition is lossless.

How to Check for Lossless Decomposition:

To ensure decomposition is lossless, we can check:

1. There should be at least one common column (attribute) between the smaller tables.

2. The common column must be a candidate key for at least one of the smaller tables.

Non-Lossless Decomposition (Counterexample):

1. Original Table:

Student ID	Name	Course
101	John	Math
102	Alice	Science

2. Decomposed into Two Tables (Incorrect):

Table 1:

Student ID	Name
101	John
102	Alice

Table 2:

Name	Course
John	Math
Alice	Science

3. Rejoining Tables:

Joining these tables on "Name" might result in duplicate rows or mismatched data, which is different from the original table. This makes it a lossy decomposition.

Conclusion:

- Lossless Join Decomposition is a key property in database normalization to ensure no data is lost during table splitting.

- It is crucial for maintaining data integrity and consistency when designing relational databases.

Normalization Using FD, MVD, and JD

Functional Dependencies (FDs):

- Definition: A Functional Dependency (FD) is a rule that describes the relationship between columns in a table. It shows how one column determines another.

- Example: In a Student table, Roll Number → Name. Knowing the Roll Number helps find the Name.

- Role in Normalization:

- FDs are used to eliminate redundancy and organize data into normal forms (1NF, 2NF, 3NF, BCNF).

- For example, in BCNF, every FD’s determinant must be a candidate key to ensure no redundancy.

Multivalued Dependencies (MVDs):

- Definition: A Multivalued Dependency (MVD) occurs when one column determines multiple independent values of another column.

- Example: In a table with Student, Course, and Hobby:

- A student named "John" can take multiple courses (Math, Science) and have multiple hobbies (Cricket, Reading).

- This creates an MVD: Student →→ Course and Student →→ Hobby.

- Role in Normalization:

- MVDs are handled in the Fourth Normal Form (4NF).

- A table is in 4NF if it is in BCNF and has no multivalued dependencies unless they are trivial (e.g., every column depends on a key).

- Fixing MVDs often involves splitting the table into smaller tables.

Example:

Original Table:

Student	Course	Hobby
John	Math	Cricket
John	Science	Reading

Split into two tables to remove MVD:

Table 1 (Student-Course):

Student	Course
John	Math
John	Science

Table 2 (Student-Hobby):

Student	Hobby
John	Cricket
John	Reading

Join Dependencies (JDs):

- Definition: A Join Dependency (JD) specifies how a table can be split into multiple smaller tables and then rejoined to recreate the original table without losing data.

- Example: A table with columns Student, Course, Teacher can be split into two tables:

1. Student, Course

2. Course, Teacher

- Role in Normalization:

- JDs are addressed in the Fifth Normal Form (5NF), also called Project-Join Normal Form (PJNF).

- A table is in 5NF if it is in 4NF and has no join dependencies that cause redundancy.

- Fixing JDs ensures that data remains consistent across complex relationships.

Alternative Approaches to Database Design

Normalization is one approach to designing a database, but there are alternative methods too:

1. Denormalization:

- Definition: This is the opposite of normalization. It combines tables to reduce the number of joins and improve performance, even if it introduces some redundancy.

- When Used:

- When the database needs to process queries quickly (e.g., in large-scale applications).

- Example:

- Instead of splitting Student and Course into separate tables, keep them in one table:

Student	Course	Teacher
John	Math	Mr. Smith

2. Entity-Relationship (ER) Modeling:

- Definition: ER modeling focuses on designing the database visually using entities (like tables), their attributes (columns), and relationships (connections between tables).

- Purpose:

- It simplifies the design process and ensures all real-world relationships are captured.

- Example:

- Entities: Student, Course.

- Relationship: A student enrolls in courses.

3. Object-Oriented Database Design:

- Definition: This approach organizes data as objects, similar to object-oriented programming.

- When Used:

- In applications where data has complex relationships or behaviors (e.g., multimedia databases or CAD software).

4. NoSQL Database Design:

- Definition: Instead of relational tables, data is stored in flexible formats like key-value pairs, documents, or graphs.

- When Used:

- For large-scale, unstructured, or semi-structured data (e.g., social media, e-commerce).

- Example:

- A product database might store each product as a document:

json

{ "ProductID": 101, "Name": "Laptop", "Price": 50000 }

Conclusion:

- Normalization using FD, MVD, and JD is critical to ensuring the database is efficient, free from redundancy, and maintains data integrity.

- Alternative approaches like denormalization, ER modeling, or NoSQL offer flexibility and performance benefits for specific use cases.

Unit 3 | DBMS Notes | AKTU Notes

Data Base Design & Normalization

Functional Dependencies (FDs)

Normal Forms (NF)

First Normal Form (1NF):

Second Normal Form (2NF):

Third Normal Form (3NF):

Boyce-Codd Normal Form (BCNF)

Inclusion Dependence

Lossless Join Decomposition

Normalization Using FD, MVD, and JD

Alternative Approaches to Database Design

No comments:

Post a Comment

Advertisement

SEARCH

LATEST

FOLLOW ME

SECCIONS

ABOUT

Popular

Latest courses

Categories

Quick Links

Comments

About

Top Links Menu

Unit 3 | DBMS Notes | AKTU Notes

Data Base Design & Normalization

Functional Dependencies (FDs)

Normal Forms (NF)

First Normal Form (1NF):

Second Normal Form (2NF):

Third Normal Form (3NF):

Boyce-Codd Normal Form (BCNF)

Inclusion Dependence

Lossless Join Decomposition

Normalization Using FD, MVD, and JD

Alternative Approaches to Database Design

No comments:

Post a Comment

Advertisement

SEARCH

LATEST

FOLLOW ME

SECCIONS

ABOUT

Popular

Latest courses

Categories

Quick Links

Comments

About