1. Classification
Classification is a data mining technique used to assign data into predefined categories or classes based on their attributes. In classification, a model is trained using known data (training dataset) and then used to classify new data.
Example: Email filtering system – Spam and Not Spam.
Hinglish Explanation: Classification ka matlab hai data ko predefined categories me divide karna.
2. Data Generalization
Data generalization is the process of summarizing detailed data into higher level concepts.
Example: City → State → Country
Hinglish: Data generalization me detailed data ko higher level concept me convert kiya jata hai.
3. Analytical Characterization
Analytical characterization describes the general characteristics of a target class of data. It summarizes the important features of the data.
Example: Analyzing characteristics of high-profit customers.
4. Analysis of Attribute Relevance
Attribute relevance analysis identifies which attributes are most important for classification or prediction. Some attributes may be irrelevant or redundant.
- Age
- Income
- Location
5. Mining Class Comparisons
Class comparison compares different classes of data to identify differences between them.
Example: Comparing customers who buy a product and customers who do not buy a product.
6. Statistical Measures in Large Databases
Statistical measures help analyze large datasets.
- Mean: Average value of data
- Median: Middle value
- Mode: Most frequent value
- Variance: Measures data spread
- Standard Deviation: Shows variation from mean
7. Statistical-Based Algorithms
These algorithms use statistical models to classify data.
- Naive Bayes Classifier
- Logistic Regression
8. Distance-Based Algorithms
Distance-based algorithms classify data based on distance between data points.
- Euclidean Distance
- Manhattan Distance
Example Algorithm: K-Nearest Neighbor (KNN)
9. Decision Tree-Based Algorithms
Decision tree is a tree structure used for classification and prediction.
- Root Node
- Decision Node
- Leaf Node
Algorithms: ID3, C4.5, CART
10. Introduction to Clustering
Clustering is the process of grouping similar data objects into clusters. Objects in the same cluster are similar while objects in different clusters are dissimilar.
11. Similarity and Distance Measures
These measures determine how similar or different data objects are.
- Euclidean Distance
- Manhattan Distance
- Cosine Similarity
12. Hierarchical Clustering Algorithms
Hierarchical clustering builds a hierarchy of clusters represented as a tree called a dendrogram.
- Agglomerative: Bottom-up approach
- Divisive: Top-down approach
13. Partitional Algorithms
Partitional clustering divides data into k clusters.
Example Algorithm: K-Means
14. Hierarchical Clustering Methods
CURE (Clustering Using Representatives)
- Uses representative points for clusters
- Handles large datasets
- Handles outliers
CHAMELEON
- Considers interconnectivity
- Considers closeness between clusters
15. Density-Based Methods
Density-based clustering identifies clusters based on dense regions in the dataset.
- DBSCAN: Detects clusters based on density
- OPTICS: Handles varying densities
16. Grid-Based Methods
Grid-based methods divide the data space into a grid structure.
- STING
- CLIQUE
17. Model-Based Methods
Model-based clustering assumes that data is generated by a statistical model.
Example: Gaussian Mixture Models
18. Association Rules Introduction
Association rule mining discovers relationships between items in large datasets.
Example: Customers who buy bread also buy butter.
19. Large Item Sets
Large item sets are groups of items that frequently appear together in transactions.
Example: {Milk, Bread}
20. Basic Algorithms for Association Rules
Apriori Algorithm Steps:
- Generate candidate item sets
- Remove infrequent item sets
- Generate frequent item sets
21. Parallel and Distributed Algorithms
Parallel and distributed algorithms process data across multiple processors or machines to improve performance.
22. Neural Network Approach
Neural networks are machine learning models inspired by the human brain. They are used for classification, prediction, and pattern recognition.
Conclusion
Classification, clustering, and association rule mining are important techniques in data mining. These methods help discover hidden patterns and support intelligent decision making in many domains.

No comments:
Post a Comment