1. Overview of Data Warehousing
Data warehousing is a system used for collecting, storing, and managing large amounts of data from different sources so that organizations can analyze it and make better decisions.
A data warehouse is mainly used for business intelligence (BI), reporting, and analysis rather than daily transactions.
Main Purposes
- Integrate data from multiple sources
- Store historical data
- Support decision making
- Improve business analysis
Hinglish Explanation
Data warehousing ek central storage system hota hai jisme different sources ka data collect karke store kiya jata hai analysis ke liye.
2. Definition of Data Warehouse
“A Data Warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data used to support management decision making.”
Key Characteristics
Subject Oriented
Data specific subject par organize hota hai (sales, customer, product).
Integrated
Different sources ka data ek common format me convert kiya jata hai.
Time Variant
Historical data store hota hai.
Non-Volatile
Data mostly read-only hota hai aur rarely update hota hai.
3. Components of Data Warehouse
1. Data Sources
- Databases
- Files
- ERP systems
- CRM systems
2. ETL Process
- Extract – Source se data lena
- Transform – Data clean aur convert karna
- Load – Data warehouse me store karna
3. Data Warehouse Storage
Central repository jahan processed data store hota hai.
4. Metadata
Metadata ka matlab hai data about data (table structure, data source information).
5. Front End Tools
- Reporting tools
- Data mining tools
- OLAP tools
4. Building a Data Warehouse
- Requirement Analysis
- Data Source Identification
- Data Cleaning
- Data Transformation
- Data Loading
- Data Analysis Tools Integration
5. Warehouse Database
Warehouse database ek large database system hota hai jo specially analysis aur reporting ke liye design kiya jata hai.
Features
- Huge historical data
- Optimized for queries
- Supports OLAP operations
6. Mapping Data Warehouse to Multiprocessor Architecture
Large data warehouses ko handle karne ke liye multiprocessor systems use kiye jate hain jisme multiple CPUs data process karte hain.
Types
- SMP (Symmetric Multiprocessing)
- MPP (Massively Parallel Processing)
7. Difference Between Database System and Data Warehouse
| Feature | Database System | Data Warehouse |
|---|---|---|
| Purpose | Transaction processing | Data analysis |
| Data | Current data | Historical data |
| Updates | Frequent updates | Rare updates |
| Users | Operational staff | Managers and analysts |
| Queries | Simple queries | Complex queries |
8. Multidimensional Data Model
Multidimensional model data ko multiple dimensions me represent karta hai analysis ke liye.
Example Dimensions
- Time
- Location
- Product
Components
- Fact Table – Numerical values (sales amount)
- Dimension Table – Descriptive attributes (time, product)
9. Data Cubes
Data cube ek multidimensional structure hota hai jo data ko multiple dimensions me represent karta hai.
Example Dimensions
- Product
- Time
- Location
Operations
- Slice
- Dice
- Roll-up
- Drill-down
10. Schema Concepts
Star Schema
- One fact table
- Multiple dimension tables
- Simple design
- Fast query performance
Snowflake Schema
- Dimension tables normalized
- More complex structure
- Less redundancy
Fact Constellation Schema
- Multiple fact tables
- Shared dimension tables
- Also called Galaxy Schema
Conclusion
Data warehousing helps organizations store and analyze large amounts of data from multiple sources. It supports decision making through multidimensional models, data cubes, and schema designs like star, snowflake, and fact constellation.

No comments:
Post a Comment