Person
Created
Sep 1, 2025 10:05 PM
Date
Materials
Sep 1, 2025 10:05 PM
‣
Supervised Learning : Regression Models
‣
Supervised Learning Models: Classification Models
Unsupervised Learning Models
Supervised Learning : Clustering Models:
Clustering Models: For grouping similar data points
- K-Means Clustering
- K-Medoids (PAM - Partition Around Medoids)
- Hierarchical Clustering
- Density-Based Clustering
- DBSCAN (Density-Based Spatial Clustering of Applications with Noise): Handles noise and clusters with arbitrary shapes.
- OPTICS (Ordering Points to Identify Clustering Structure): Extends DBSCAN to detect clusters of varying densities.
- Model-Based Clustering
- Gaussian Mixture Models (GMM): Probabilistic model assuming data is generated from a mixture of Gaussian distributions.
- Expectation-Maximization (EM): Iterative approach to optimize Gaussian Mixture Models.
Dimensionality Reduction Models:
For reducing features while retaining essential information
- Principal Component Analysis (PCA)
- Kernel PCA
- t-Distributed Stochastic Neighbor Embedding (t-SNE)
- Uniform Manifold Approximation and Projection (UMAP)
- Factor Analysis
- Independent Component Analysis (ICA)
Association Rule Mining (For finding relationships between items in large datasets)
- Apriori Algorithm
- Identifies frequent itemsets and generates association rules.
- Example: "If a customer buys bread, they are likely to buy butter."
- Eclat Algorithm
- Efficient alternative to Apriori using depth-first search for frequent itemsets.
- FP-Growth (Frequent Pattern Growth)
- Creates a compressed representation of the dataset for efficient mining of frequent patterns.
Anomaly Detection Models (For identifying rare or abnormal data points)
- Isolation Forest
- Randomly splits data to isolate anomalies efficiently.
- One-Class SVM
- SVM tailored for identifying outliers in high-dimensional spaces.
- Elliptic Envelope
- Fits a multivariate Gaussian distribution to detect outliers.
- Auto-encoders
- Neural network-based method for reconstructing data; anomalies are identified by high reconstruction error.
Latent Variable Models (For discovering hidden structures in data)
- Latent Dirichlet Allocation (LDA)
- Topic modeling in text data, identifying topics as latent variables.
- Restricted Boltzmann Machines (RBM)
- Used for feature learning and collaborative filtering.
- Autoencoders (Dimensionality Reduction and Anomaly Detection)
- Encodes data into a compressed representation and reconstructs it.
Reinforcement Models (Unsupervised learning for exploratory use)
- Self-Organizing Maps (SOM)
- Neural network model for mapping high-dimensional data into 2D grids.
- Generative Models (Also used in unsupervised learning tasks)
- Variational Autoencoders (VAEs): Learn latent space representations of data.
- Generative Adversarial Networks (GANs): Learn to generate new data points similar to the original dataset.
‣
3. Ensemble Learning
I want to learn everything about [MODEL NAME] as a data scientist. Include both theory and practical aspects. Give me a complete module-style course structure with sections on:
1. Basic concept and use cases
2. Theoretical foundations and math (if any)
3. Parameters and tuning
4. Preprocessing requirements
5. Implementation from scratch (optional)
6. Implementation using Scikit-learn (or other libraries)
7. Evaluation techniques and metrics
8. Pros and cons
9. Best practices and common pitfalls
10. Real-world projects or case studies
11. Comparison with other models
12. Additional resources (articles, videos, papers)
Structure the content in modules for easy self-paced learning.