UNIT 4 | Data Warehousing and Data Mining Notes

1. Classification

Classification is a data mining technique used to assign data into predefined categories or classes based on their attributes. In classification, a model is trained using known data (training dataset) and then used to classify new data.

Example: Email filtering system – Spam and Not Spam.

Hinglish Explanation: Classification ka matlab hai data ko predefined categories me divide karna.

2. Data Generalization

Data generalization is the process of summarizing detailed data into higher level concepts.

Example: City → State → Country

Hinglish: Data generalization me detailed data ko higher level concept me convert kiya jata hai.

3. Analytical Characterization

Analytical characterization describes the general characteristics of a target class of data. It summarizes the important features of the data.

Example: Analyzing characteristics of high-profit customers.

4. Analysis of Attribute Relevance

Attribute relevance analysis identifies which attributes are most important for classification or prediction. Some attributes may be irrelevant or redundant.

Age
Income
Location

5. Mining Class Comparisons

Class comparison compares different classes of data to identify differences between them.

Example: Comparing customers who buy a product and customers who do not buy a product.

6. Statistical Measures in Large Databases

Statistical measures help analyze large datasets.

Mean: Average value of data
Median: Middle value
Mode: Most frequent value
Variance: Measures data spread
Standard Deviation: Shows variation from mean

7. Statistical-Based Algorithms

These algorithms use statistical models to classify data.

Naive Bayes Classifier
Logistic Regression

8. Distance-Based Algorithms

Distance-based algorithms classify data based on distance between data points.

Euclidean Distance
Manhattan Distance

Example Algorithm: K-Nearest Neighbor (KNN)

9. Decision Tree-Based Algorithms

Decision tree is a tree structure used for classification and prediction.

Root Node
Decision Node
Leaf Node

Algorithms: ID3, C4.5, CART

10. Introduction to Clustering

Clustering is the process of grouping similar data objects into clusters. Objects in the same cluster are similar while objects in different clusters are dissimilar.

11. Similarity and Distance Measures

These measures determine how similar or different data objects are.

Euclidean Distance
Manhattan Distance
Cosine Similarity

12. Hierarchical Clustering Algorithms

Hierarchical clustering builds a hierarchy of clusters represented as a tree called a dendrogram.

Agglomerative: Bottom-up approach
Divisive: Top-down approach

13. Partitional Algorithms

Partitional clustering divides data into k clusters.

Example Algorithm: K-Means

14. Hierarchical Clustering Methods

CURE (Clustering Using Representatives)

Uses representative points for clusters
Handles large datasets
Handles outliers

CHAMELEON

Considers interconnectivity
Considers closeness between clusters

15. Density-Based Methods

Density-based clustering identifies clusters based on dense regions in the dataset.

DBSCAN: Detects clusters based on density
OPTICS: Handles varying densities

16. Grid-Based Methods

Grid-based methods divide the data space into a grid structure.

STING
CLIQUE

17. Model-Based Methods

Model-based clustering assumes that data is generated by a statistical model.

Example: Gaussian Mixture Models

18. Association Rules Introduction

Association rule mining discovers relationships between items in large datasets.

Example: Customers who buy bread also buy butter.

19. Large Item Sets

Large item sets are groups of items that frequently appear together in transactions.

Example: {Milk, Bread}

20. Basic Algorithms for Association Rules

Apriori Algorithm Steps:

Generate candidate item sets
Remove infrequent item sets
Generate frequent item sets

21. Parallel and Distributed Algorithms

Parallel and distributed algorithms process data across multiple processors or machines to improve performance.

22. Neural Network Approach

Neural networks are machine learning models inspired by the human brain. They are used for classification, prediction, and pattern recognition.

Conclusion

Classification, clustering, and association rule mining are important techniques in data mining. These methods help discover hidden patterns and support intelligent decision making in many domains.

UNIT 4 | Data Warehousing and Data Mining Notes | AKTU Notes

1. Classification

2. Data Generalization

3. Analytical Characterization

4. Analysis of Attribute Relevance

5. Mining Class Comparisons

6. Statistical Measures in Large Databases

7. Statistical-Based Algorithms

8. Distance-Based Algorithms

9. Decision Tree-Based Algorithms

10. Introduction to Clustering

11. Similarity and Distance Measures

12. Hierarchical Clustering Algorithms

13. Partitional Algorithms

14. Hierarchical Clustering Methods

15. Density-Based Methods

16. Grid-Based Methods

17. Model-Based Methods

18. Association Rules Introduction

19. Large Item Sets

20. Basic Algorithms for Association Rules

21. Parallel and Distributed Algorithms

22. Neural Network Approach

Conclusion

No comments:

Post a Comment

Advertisement

SEARCH

LATEST

FOLLOW ME

SECCIONS

ABOUT

Popular

Latest courses

Categories

Quick Links

Comments

About

Top Links Menu

UNIT 4 | Data Warehousing and Data Mining Notes | AKTU Notes

1. Classification

2. Data Generalization

3. Analytical Characterization

4. Analysis of Attribute Relevance

5. Mining Class Comparisons

6. Statistical Measures in Large Databases

7. Statistical-Based Algorithms

8. Distance-Based Algorithms

9. Decision Tree-Based Algorithms

10. Introduction to Clustering

11. Similarity and Distance Measures

12. Hierarchical Clustering Algorithms

13. Partitional Algorithms

14. Hierarchical Clustering Methods

15. Density-Based Methods

16. Grid-Based Methods

17. Model-Based Methods

18. Association Rules Introduction

19. Large Item Sets

20. Basic Algorithms for Association Rules

21. Parallel and Distributed Algorithms

22. Neural Network Approach

Conclusion

No comments:

Post a Comment

Advertisement

SEARCH

LATEST

FOLLOW ME

SECCIONS

ABOUT

Popular

Latest courses

Categories

Quick Links

Comments

About