UNIT 2 | Data Warehousing and Data Mining Notes | AKTU Notes



    1. Warehousing Strategy

    A data warehousing strategy is a plan that defines how an organization will design, build, and maintain its data warehouse. It ensures that the warehouse supports business goals and provides useful information for decision making.

    A good warehousing strategy focuses on:

    • Identifying business requirements
    • Selecting data sources
    • Designing the warehouse architecture
    • Planning data integration and analysis tools

    Types of Warehousing Strategy

    • Top-Down Approach: Entire enterprise data warehouse is built first, then data marts are created.
    • Bottom-Up Approach: Data marts are created first and later integrated into a full data warehouse.

    Hinglish Explanation: Warehousing strategy ek planning process hota hai jisme decide kiya jata hai ki data warehouse kaise banega aur kaise use hoga.


    2. Warehouse Management and Support Processes

    Warehouse management processes are activities required to operate and maintain the data warehouse system efficiently.

    Main processes include:

    • Data Extraction: Data different sources se collect kiya jata hai.
    • Data Cleaning: Errors aur duplicate records remove kiye jate hain.
    • Data Transformation: Data ko standard format me convert kiya jata hai.
    • Data Loading: Processed data warehouse me store kiya jata hai.
    • Data Refreshing: Regular intervals par data update kiya jata hai.

    3. Warehouse Planning and Implementation

    Planning and implementation is the process of designing and developing a data warehouse system step by step.

    Steps in planning:

    1. Requirement Analysis
    2. Data Source Identification
    3. Data Modeling
    4. Hardware and Software Selection
    5. Implementation and Deployment

    Hinglish Explanation: Warehouse planning ka matlab hai data warehouse banane ki proper planning karna aur implementation ka matlab hai actual system build karna.


    4. Hardware and Operating Systems for Data Warehousing

    Data warehouses store very large volumes of data, so powerful hardware and efficient operating systems are required.

    Hardware Requirements

    • High storage capacity
    • High processing power
    • Large memory (RAM)

    Operating System Requirements

    • Support parallel processing
    • High reliability
    • High performance

    Examples: Linux, UNIX, Windows Server


    5. Client / Server Computing Model and Data Warehousing

    Client-server architecture is a computing model where tasks are divided between clients and servers.

    • Client: User interface aur request bhejta hai.
    • Server: Data process karta hai aur results return karta hai.

    Advantages:

    • Better performance
    • Scalability
    • Efficient resource usage

    6. Parallel Processors and Cluster Systems

    Parallel processing means multiple processors working together to process large data simultaneously.

    Types

    • Shared Memory Architecture
    • Shared Disk Architecture
    • Shared Nothing Architecture

    Cluster systems are groups of computers connected together that work as a single system.

    • High performance
    • Fault tolerance
    • Scalability

    7. Distributed DBMS Implementations

    A Distributed Database Management System (DDBMS) manages databases that are distributed across multiple locations.

    Benefits:

    • Faster data access
    • Improved reliability
    • Better scalability

    Hinglish Explanation: Distributed DBMS me data different locations par store hota hai lekin user ko ek hi database dikhta hai.


    8. Warehousing Software

    Warehousing software tools are used to build, manage, and analyze data warehouses.

    ETL Tools

    • Informatica
    • Talend
    • Microsoft SSIS

    OLAP Tools

    • Microsoft Analysis Services
    • Oracle OLAP

    Reporting Tools

    • Power BI
    • Tableau

    9. Warehouse Schema Design

    Schema design defines how data is organized inside the data warehouse.

    Star Schema

    • One central fact table
    • Multiple dimension tables
    • Simple design

    Snowflake Schema

    • Dimension tables normalized
    • More complex structure

    Fact Constellation Schema

    • Multiple fact tables
    • Shared dimension tables
    • Also called Galaxy Schema

    Conclusion

    Data warehouse process and technology involve planning, designing, implementing, and maintaining data warehouses using advanced hardware, software, and computing models. Technologies such as parallel processing, distributed DBMS, and client-server architecture help handle large volumes of data efficiently.

    No comments:

    Post a Comment