Unit 4 | DBMS Notes | AKTU Notes


Transaction Processing Concept

Unit 4 | DBMS Notes | AKTU Notes

    Transaction System

    Definition:  

    A Transaction System in a database is a way of ensuring that a series of actions (called a transaction) is executed as a single, complete unit. Either all actions in the transaction happen successfully, or none of them happens at all.

    Key Properties (ACID):

    A good transaction system follows the ACID properties:

    1. Atomicity: A transaction is all-or-nothing. If any part fails, the entire transaction is rolled back.  

       - Example: If money is deducted from one account but not added to another, the entire process is canceled.

    2. Consistency: The database stays in a valid state before and after the transaction.  

       - Example: After transferring money, the total amount in all accounts should remain the same.

    3. Isolation: Transactions happening at the same time don’t interfere with each other.  

       - Example: If two people are booking the last train seat, only one transaction will succeed.

    4. Durability: Once a transaction is completed, the changes are saved permanently, even if there is a system failure.  

       - Example: After transferring money, the new balances are stored securely.

    Example:

    1. A bank transaction to transfer money:  

       - Step 1: Deduct ₹500 from Account A.  

       - Step 2: Add ₹500 to Account B.  


       If Step 2 fails, Step 1 is also canceled to maintain consistency.


    2. A shopping website order:

       - Step 1: Check if the item is in stock.  

       - Step 2: Deduct payment from the customer.  

       - Step 3: Reduce the stock count in the database.  


       If any step fails, the entire transaction is rolled back.


    Testing of Serializability

    Definition:  

    Serializability is a way to check if multiple transactions running at the same time (concurrent transactions) produce the same result as if they were executed one after another (serially). 

    Why Is It Important?

    When multiple users access the database simultaneously, their transactions should not interfere and lead to incorrect data. Testing for serializability ensures the database remains consistent.

    Types of Serializability:

    1. Conflict Serializability: Checks if transactions can be rearranged by swapping non-conflicting operations to achieve a serial order.  

       - Example: Two transactions:  

         - T1: Read data, then update data.  

         - T2: Read the same data.  

         If T1 and T2 don’t conflict, they can be swapped.


    2. View Serializability: Ensures that the order of transactions maintains the same final result, even if the intermediate steps are different.

    How to Test for Serializability:

    The testing process involves creating a precedence graph:

    1. Create a graph where:

       - Each transaction is a node.  

       - Draw a directed edge between transactions if one depends on the other.  


    2. Check the graph:

       - If there is a cycle (a loop), the schedule is not serializable.

       - If there is no cycle, the schedule is serializable.

    Example:

    1. Two Transactions:

       - T1: Write(X).  

       - T2: Read(X). 

       Schedule:  

       - T1 writes X → T2 reads X.

       Create a graph:

       - Node T1 → Node T2 (T2 depends on T1’s result).  

       No cycle means the schedule is serializable.

    Conclusion:

    - A Transaction System ensures reliability, consistency, and proper execution of operations using ACID properties.

    - Testing Serializability verifies that concurrent transactions maintain the database's consistency as if they were executed one at a time.


    Serializability of Schedules

    Definition:  

    A schedule is the sequence in which multiple database transactions are executed. Serializability ensures that the outcome of executing transactions concurrently (at the same time) is the same as if they were executed one after the other (serially).

    Why Is It Important?  

    - When many transactions are running at the same time, they may affect each other’s results. Serializability ensures the database remains consistent and produces correct results.

    Example:  

    Transactions T1 and T2:  

    - T1: Reads X, Updates X  

    - T2: Reads X  

    Schedule 1 (Concurrent):  

    - T1: Read X  

    - T2: Read X  

    - T1: Update X  

    If the final result of this concurrent schedule is the same as running T1 and T2 one after another, it is serializable.


    Conflict Serializable Schedule

    Definition:  

    A schedule is Conflict Serializable if it can be rearranged (by swapping non-conflicting operations) to achieve the same result as a serial schedule.

    What Is a Conflict?  

    A conflict occurs when two transactions access the same data and at least one of them is a write operation.  

    - Types of Conflicts:

      1. Read-Write Conflict: One transaction reads data while another writes to it.

      2. Write-Read Conflict: One transaction writes data while another reads it.

      3. Write-Write Conflict: Two transactions write to the same data.

    How to Test for Conflict Serializability:

    - Create a precedence graph:

      1. Each transaction is a node.  

      2. Draw a directed edge from one transaction to another if there is a conflict (e.g., T1 writes, and T2 reads the same data).  

      3. If the graph has a cycle, the schedule is not conflict serializable. If there’s no cycle, it is conflict serializable.

    Example:  

    Transactions T1 and T2:  

    - T1: Read(X), Write(X)  

    - T2: Write(X)  

    Schedule:  

    - T1: Read(X)  

    - T2: Write(X)  

    - T1: Write(X)  

    Precedence Graph:  

    - T1 → T2 (T2 writes X after T1 reads X).  

    - T2 → T1 (T1 writes X after T2 writes X).  

    This graph has a cycle, so the schedule is not conflict serializable.


    View Serializable Schedule

    Definition:  

    A schedule is View Serializable if it produces the same final result as a serial schedule, even if the intermediate steps are different.

    Difference Between Conflict and View Serializability:

    - Conflict Serializability: Focuses on swapping non-conflicting operations to match a serial order.  

    - View Serializability: Focuses on the final output of the transactions, ignoring intermediate steps.

    Steps to Check View Serializability:

    A schedule is view serializable if:  

    1. Initial Reads Are the Same: Transactions read the same initial values as in a serial schedule.  

    2. Writes Are Consistent: The data written by transactions in the schedule matches what would be written in a serial schedule.  

    3. Final Writes Are the Same: The final result of the database is identical to that of a serial schedule.

    Example:  

    Transactions T1 and T2:  

    - T1: Write(A)  

    - T2: Read(A), Write(A)  

    Schedule 1 (Concurrent):  

    - T1: Write(A)  

    - T2: Read(A)  

    - T2: Write(A)  

    This schedule produces the same final result as running T1 before T2. Hence, it is view serializable.

    Conclusion:

    - Serializability of Schedules ensures that concurrent transactions produce correct results like serial transactions.  

    - Conflict Serializable Schedules focus on rearranging operations to avoid conflicts.  

    - View Serializable Schedules ensure the final outcome matches a serial schedule, even if intermediate steps differ.


    Recoverability

    Definition:  

    In a database, recoverability ensures that if a transaction fails or is rolled back, the database can still maintain consistency by undoing or recovering affected changes.

    Key Concepts in Recoverability:

    1. Commit: A transaction is said to be committed when all its changes are permanently saved to the database.  

    2. Rollback: If a transaction encounters an error or fails, all its changes are undone to maintain consistency.  

    3. Dependency Between Transactions:  

       - Sometimes, one transaction depends on another. For example, if Transaction T2 reads data modified by Transaction T1, T1 must commit first for the database to remain consistent.

    Types of Schedules Based on Recoverability:

    1. Recoverable Schedule:

       - A schedule where a transaction only commits if all the transactions it depends on have also committed.  

       - Example:  

         - T1: Write(A)  

         - T2: Read(A)  

         - T1 commits → T2 commits (Recoverable).

    2. Non-Recoverable Schedule:

       - A schedule where a transaction commits even if the transactions it depends on fail or roll back.  

       - Example:  

         - T1: Write(A)  

         - T2: Read(A) → T2 commits  

         - T1 fails (Not Recoverable).

    3. Cascading Rollbacks:

       - When one transaction’s failure causes a chain reaction, rolling back multiple dependent transactions.  

       - Example:  

         - T1: Write(A)  

         - T2: Read(A)  

         - T3: Read(A)  

         - If T1 fails, both T2 and T3 must roll back.

    4. Cascadeless Schedule:

       - A schedule where transactions do not read uncommitted changes from other transactions. This prevents cascading rollbacks.

    Importance of Recoverability:  

    - Ensures the database remains consistent and reliable, even during transaction failures.


    Recovery from Transaction Failures

    Definition:  

    When a transaction fails (due to system crashes, power failures, or logical errors), the database must recover to a consistent state. This is achieved by undoing incomplete transactions or redoing committed ones.

    Types of Failures:

    1. System Failure: Hardware or software crashes that stop transactions abruptly.  

    2. Transaction Failure: Errors like invalid input or deadlocks that stop a single transaction.  

    3. Disk Failure: Physical damage to the storage where the database is stored.

    Techniques for Recovery:

    1. Log-Based Recovery:

       - A log is a record of all changes made by transactions. It helps in undoing or redoing changes during recovery.  

       - Steps:

         - Undo: Revert changes of incomplete transactions using the log.  

         - Redo: Reapply changes of committed transactions to ensure they are not lost.

    2. Deferred Update:

       - Changes made by a transaction are stored in a temporary area and applied to the database only after the transaction commits.  

       - If the transaction fails, no changes are made to the database.

    3. Immediate Update:

       - Changes are applied to the database immediately but are logged so they can be undone if the transaction fails.

    4. Checkpoints:

       - A checkpoint is a saved state of the database at a particular point in time. During recovery, the system starts from the last checkpoint, reducing the recovery time.

    Example of Recovery:  

    1. Transaction T1:  

       - Step 1: Write(A).  

       - Step 2: Write(B).  

       - System crashes before T1 commits.  

       Recovery Process:

       - Undo: Remove the changes made to A and B because T1 didn’t commit.

    2. Transaction T2:  

       - Step 1: Write(C).  

       - T2 commits.  

       - System crashes.

       Recovery Process:

       - Redo: Apply the changes to C because T2 had committed.

    Importance of Recovery:

    - Prevents data loss during failures.  

    - Maintains the reliability and consistency of the database.

    Conclusion:  

    - Recoverability ensures the database can handle failures without losing consistency.  

    - Recovery from Transaction Failures involves techniques like logging, checkpoints, and undo/redo to restore the database to a consistent state.


    Log-Based Recovery

    Definition:  

    In a database, a log is a record that keeps track of all the actions performed by transactions. Log-Based Recovery is a process that uses these logs to recover the database to a consistent state after a failure.

    Key Concepts:

    1. Logs:  

       - Logs store information like:

         - Transaction ID  

         - Data item being modified  

         - Old value (before change)  

         - New value (after change)

    2. Undo and Redo:

       - Undo: If a transaction failed or was incomplete, reverse its changes.  

       - Redo: If a transaction was committed before the failure, apply its changes again to ensure they are not lost.

    Types of Log-Based Recovery:

    1. Deferred Update:  

       - Changes are recorded in the log but applied to the database only after the transaction commits.  

       - Advantage: No need to undo changes since uncommitted changes are not written to the database.

    2. Immediate Update:  

       - Changes are written to the database immediately and recorded in the log.  

       - Requires both undo and redo operations during recovery.

    Example:

    - Transaction T1:

      1. Log: "T1 starts."  

      2. Log: "T1 changes A from 10 to 20."  

      3. Log: "T1 commits.

    - If the system crashes after step 2:  

      - Undo changes (restore A to 10).  

    - If the system crashes after step 3:  

      - Redo changes (ensure A is 20).

    Why is Log-Based Recovery Important?  

    - Ensures no data is lost during a crash.  

    - Maintains the reliability and consistency of the database.


    Checkpoints

    Definition:  

    A checkpoint is a saved state of the database at a specific point in time. It helps reduce the time and effort required for recovery by providing a point to start from instead of scanning all previous logs.

    How Checkpoints Work:

    1. Periodically, the system saves a snapshot of the database and records it in the log.  

    2. During recovery, the system starts from the last checkpoint instead of the beginning of the log.

    Example:

    - Without Checkpoints:  

      If a system crash occurs, the recovery process must check all transactions from the start of the log.

    - With Checkpoints:  

      If a checkpoint is created every 10 minutes, recovery only checks logs after the last checkpoint, saving time.

    Why Are Checkpoints Important?

    - Speeds up recovery after a failure.  

    - Reduces the number of transactions to analyze.


    Deadlock Handling

    Definition:  

    A deadlock occurs when two or more transactions wait for each other to release resources, and none of them can proceed.

    Example of a Deadlock:

    - Transaction T1 locks data item A and needs data item B.  

    - Transaction T2 locks data item B and needs data item A.  

    - T1 waits for T2, and T2 waits for T1. Both are stuck, causing a deadlock.

    How to Handle Deadlocks:

    1. Prevention:

       - Avoid deadlocks by controlling how resources are allocated.

       - Example: Transactions must lock all required resources at the start.

    2. Detection and Resolution:

       - The system periodically checks for deadlocks.  

       - If a deadlock is found, one transaction is aborted (rolled back) to break the cycle.

    3. Timeout:

       - If a transaction waits too long, it is rolled back and restarted.

    Why Is Deadlock Handling Important?  

    - Prevents transactions from getting stuck forever.  

    - Ensures smooth execution of concurrent transactions.

    Conclusion:

    - Log-Based Recovery uses logs to undo or redo transactions after a failure.  

    - Checkpoints save the database state periodically to speed up recovery.  

    - Deadlock Handling ensures that transactions don’t block each other permanently, maintaining smooth operation of the database.


    Distributed Database

    Definition:  
    A Distributed Database is a database that is not stored in one location but is spread across multiple computers (called nodes) connected through a network. The goal is to provide efficient access to data even if it is stored in different locations.

    Key Features:
    1. Data Distribution: Data is stored across multiple locations, but it is accessed as if it were stored in one place.
    2. Autonomy: Each node (computer) can operate independently.
    3. Scalability: New nodes can be added easily to increase storage and processing capacity.
    4. Reliability: If one node fails, others can continue working, improving system reliability.

    Example:
    Think of a bank with branches in different cities. Each branch stores its local data but can also access data from other branches.

    Distributed Data Storage

    Definition:  
    Distributed Data Storage refers to how data is physically divided and stored across multiple locations or nodes in a distributed database.

    Ways Data Can Be Stored:
    1. Fragmentation:
       - Data is divided into smaller pieces (fragments) and stored in different locations.
       - Example: A customer database may store customer details in one city and account details in another.

    2. Replication:
       - Copies of the same data are stored in multiple locations.
       - Advantage: Improves reliability and faster data access.
       - Example: A shopping website may keep the same product data on servers in different countries.

    3. Combination (Hybrid):
       - Uses both fragmentation and replication to store data efficiently.

    Why Distributed Data Storage is Important:
    - Reduces data access time by storing data closer to where it is needed.  
    - Increases fault tolerance (if one node fails, the data can still be accessed from another).

    Concurrency Control

    Definition:  
    Concurrency Control ensures that multiple users can access and update the distributed database at the same time without causing conflicts or errors.

    Why is it needed?
    In a distributed database, many users or applications may try to:
    - Read the same data.
    - Update the same data simultaneously.
    This can lead to inconsistencies if not managed properly.

    Techniques for Concurrency Control:
    1. Locking Mechanism:
       - When one user is updating data, it is locked, so others cannot update it simultaneously.
       - Example: If User A is booking a movie ticket, User B cannot book the same seat until User A finishes.

    2. Timestamp Ordering:
       - Each transaction is given a timestamp, and transactions are executed in the order of their timestamps.
       - Ensures fairness and avoids conflicts.

    3. Distributed Commit Protocol:
       - Ensures that updates made by a transaction are applied to all nodes or none at all to maintain consistency.

    Example:
    If one student is updating their marks in a school database, others trying to access the same data should either see the old marks or wait until the update is complete.

    Directory System

    Definition:  
    A Directory System in a distributed database is like a map that keeps track of where data is stored across different locations. It helps users and applications quickly find and access the required data.

    Types of Directories:
    1. Global Directory:
       - Contains information about all data stored across all nodes.
       - Used when users need access to the entire database.

    2. Local Directory:
       - Contains information about data stored in a specific node.
       - Used when users access data only from a specific location.

    Why is a Directory System Important?:
    - Makes it easier to locate and access data quickly in a distributed database.
    - Reduces the complexity of managing large amounts of distributed data.

    Example:
    Imagine a library system with branches in different cities. The directory system tells you which branch has the book you are looking for and helps you access it.

    Conclusion:
    - A Distributed Database spreads data across multiple locations for better performance and reliability.  
    - Distributed Data Storage organizes data using fragmentation, replication, or both.  
    - Concurrency Control ensures smooth and error-free access to data by multiple users at the same time.  
    - A Directory System acts as a guide to help users find where the data is stored in the distributed system.

    No comments:

    Post a Comment