UNIT 2 | Data Warehousing and Data Mining Notes

1. Warehousing Strategy

A data warehousing strategy is a plan that defines how an organization will design, build, and maintain its data warehouse. It ensures that the warehouse supports business goals and provides useful information for decision making.

A good warehousing strategy focuses on:

Identifying business requirements
Selecting data sources
Designing the warehouse architecture
Planning data integration and analysis tools

Types of Warehousing Strategy

Top-Down Approach: Entire enterprise data warehouse is built first, then data marts are created.
Bottom-Up Approach: Data marts are created first and later integrated into a full data warehouse.

Hinglish Explanation: Warehousing strategy ek planning process hota hai jisme decide kiya jata hai ki data warehouse kaise banega aur kaise use hoga.

2. Warehouse Management and Support Processes

Warehouse management processes are activities required to operate and maintain the data warehouse system efficiently.

Main processes include:

Data Extraction: Data different sources se collect kiya jata hai.
Data Cleaning: Errors aur duplicate records remove kiye jate hain.
Data Transformation: Data ko standard format me convert kiya jata hai.
Data Loading: Processed data warehouse me store kiya jata hai.
Data Refreshing: Regular intervals par data update kiya jata hai.

3. Warehouse Planning and Implementation

Planning and implementation is the process of designing and developing a data warehouse system step by step.

Steps in planning:

Requirement Analysis
Data Source Identification
Data Modeling
Hardware and Software Selection
Implementation and Deployment

Hinglish Explanation: Warehouse planning ka matlab hai data warehouse banane ki proper planning karna aur implementation ka matlab hai actual system build karna.

4. Hardware and Operating Systems for Data Warehousing

Data warehouses store very large volumes of data, so powerful hardware and efficient operating systems are required.

Hardware Requirements

High storage capacity
High processing power
Large memory (RAM)

Operating System Requirements

Support parallel processing
High reliability
High performance

Examples: Linux, UNIX, Windows Server

5. Client / Server Computing Model and Data Warehousing

Client-server architecture is a computing model where tasks are divided between clients and servers.

Client: User interface aur request bhejta hai.
Server: Data process karta hai aur results return karta hai.

Advantages:

Better performance
Scalability
Efficient resource usage

6. Parallel Processors and Cluster Systems

Parallel processing means multiple processors working together to process large data simultaneously.

Types

Shared Memory Architecture
Shared Disk Architecture
Shared Nothing Architecture

Cluster systems are groups of computers connected together that work as a single system.

High performance
Fault tolerance
Scalability

7. Distributed DBMS Implementations

A Distributed Database Management System (DDBMS) manages databases that are distributed across multiple locations.

Benefits:

Faster data access
Improved reliability
Better scalability

Hinglish Explanation: Distributed DBMS me data different locations par store hota hai lekin user ko ek hi database dikhta hai.

8. Warehousing Software

Warehousing software tools are used to build, manage, and analyze data warehouses.

ETL Tools

Informatica
Talend
Microsoft SSIS

OLAP Tools

Microsoft Analysis Services
Oracle OLAP

Reporting Tools

Power BI
Tableau

9. Warehouse Schema Design

Schema design defines how data is organized inside the data warehouse.

Star Schema

One central fact table
Multiple dimension tables
Simple design

Snowflake Schema

Dimension tables normalized
More complex structure

Fact Constellation Schema

Multiple fact tables
Shared dimension tables
Also called Galaxy Schema

Conclusion

Data warehouse process and technology involve planning, designing, implementing, and maintaining data warehouses using advanced hardware, software, and computing models. Technologies such as parallel processing, distributed DBMS, and client-server architecture help handle large volumes of data efficiently.

UNIT 2 | Data Warehousing and Data Mining Notes | AKTU Notes

1. Warehousing Strategy

2. Warehouse Management and Support Processes

3. Warehouse Planning and Implementation

4. Hardware and Operating Systems for Data Warehousing

5. Client / Server Computing Model and Data Warehousing

6. Parallel Processors and Cluster Systems

7. Distributed DBMS Implementations

8. Warehousing Software

9. Warehouse Schema Design

Conclusion

No comments:

Post a Comment

Advertisement

SEARCH

LATEST

FOLLOW ME

SECCIONS

ABOUT

Popular

Latest courses

Categories

Quick Links

Comments

About

Top Links Menu

UNIT 2 | Data Warehousing and Data Mining Notes | AKTU Notes

1. Warehousing Strategy

2. Warehouse Management and Support Processes

3. Warehouse Planning and Implementation

4. Hardware and Operating Systems for Data Warehousing

5. Client / Server Computing Model and Data Warehousing

6. Parallel Processors and Cluster Systems

7. Distributed DBMS Implementations

8. Warehousing Software

9. Warehouse Schema Design

Conclusion

No comments:

Post a Comment

Advertisement

SEARCH

LATEST

FOLLOW ME

SECCIONS

ABOUT

Popular

Latest courses

Categories

Quick Links

Comments

About