UNIT 1 | Data Warehousing and Data Mining Notes | AKTU Notes



    1. Overview of Data Warehousing

    Data warehousing is a system used for collecting, storing, and managing large amounts of data from different sources so that organizations can analyze it and make better decisions.

    A data warehouse is mainly used for business intelligence (BI), reporting, and analysis rather than daily transactions.

    Main Purposes

    • Integrate data from multiple sources
    • Store historical data
    • Support decision making
    • Improve business analysis

    Hinglish Explanation

    Data warehousing ek central storage system hota hai jisme different sources ka data collect karke store kiya jata hai analysis ke liye.


    2. Definition of Data Warehouse

    “A Data Warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data used to support management decision making.”

    Key Characteristics

    Subject Oriented

    Data specific subject par organize hota hai (sales, customer, product).

    Integrated

    Different sources ka data ek common format me convert kiya jata hai.

    Time Variant

    Historical data store hota hai.

    Non-Volatile

    Data mostly read-only hota hai aur rarely update hota hai.


    3. Components of Data Warehouse

    1. Data Sources

    • Databases
    • Files
    • ERP systems
    • CRM systems

    2. ETL Process

    • Extract – Source se data lena
    • Transform – Data clean aur convert karna
    • Load – Data warehouse me store karna

    3. Data Warehouse Storage

    Central repository jahan processed data store hota hai.

    4. Metadata

    Metadata ka matlab hai data about data (table structure, data source information).

    5. Front End Tools

    • Reporting tools
    • Data mining tools
    • OLAP tools

    4. Building a Data Warehouse

    1. Requirement Analysis
    2. Data Source Identification
    3. Data Cleaning
    4. Data Transformation
    5. Data Loading
    6. Data Analysis Tools Integration

    5. Warehouse Database

    Warehouse database ek large database system hota hai jo specially analysis aur reporting ke liye design kiya jata hai.

    Features

    • Huge historical data
    • Optimized for queries
    • Supports OLAP operations

    6. Mapping Data Warehouse to Multiprocessor Architecture

    Large data warehouses ko handle karne ke liye multiprocessor systems use kiye jate hain jisme multiple CPUs data process karte hain.

    Types

    • SMP (Symmetric Multiprocessing)
    • MPP (Massively Parallel Processing)

    7. Difference Between Database System and Data Warehouse

    Feature Database System Data Warehouse
    Purpose Transaction processing Data analysis
    Data Current data Historical data
    Updates Frequent updates Rare updates
    Users Operational staff Managers and analysts
    Queries Simple queries Complex queries

    8. Multidimensional Data Model

    Multidimensional model data ko multiple dimensions me represent karta hai analysis ke liye.

    Example Dimensions

    • Time
    • Location
    • Product

    Components

    • Fact Table – Numerical values (sales amount)
    • Dimension Table – Descriptive attributes (time, product)


    9. Data Cubes

    Data cube ek multidimensional structure hota hai jo data ko multiple dimensions me represent karta hai.

    Example Dimensions

    • Product
    • Time
    • Location

    Operations

    • Slice
    • Dice
    • Roll-up
    • Drill-down

    10. Schema Concepts

    Star Schema

    • One fact table
    • Multiple dimension tables
    • Simple design
    • Fast query performance

    Snowflake Schema

    • Dimension tables normalized
    • More complex structure
    • Less redundancy

    Fact Constellation Schema

    • Multiple fact tables
    • Shared dimension tables
    • Also called Galaxy Schema

    Conclusion

    Data warehousing helps organizations store and analyze large amounts of data from multiple sources. It supports decision making through multidimensional models, data cubes, and schema designs like star, snowflake, and fact constellation.

    No comments:

    Post a Comment