Machine_Learning_Guide
📑

Machine_Learning_Guide

Person
U
Untitled
Created
Sep 1, 2025 10:05 PM
Date
December 7, 2024
Materials
Sep 1, 2025 10:05 PM
image

📘 Machine Learning Study Guide

Learning machine learning isn’t just about running models — it’s about following a structured workflow that ensures accuracy, reliability, and actionable insights.

To make my approach clear, I’ve organised this study guide into five key stages that I apply across projects:

🔹 Part 1: Data Processing

The foundation of every project. Here, I prepare the dataset by cleaning, transforming, and encoding features so the data is ready for analysis.

🔹 Part 2: Exploratory Data Analysis (EDA)

Before modelling, I take a deep dive into the data: uncovering patterns, correlations, and outliers that shape how I design the model.

🔹 Part 3: Setting Up the Validation Framework

I establish a robust framework to split the data into training, validation, and test sets, ensuring models are fair, generalisable, and not overfitted.

🔹 Part 4: Building and Training the Model

This is where algorithms come into play. From regression to ensemble methods, I train models on the processed data while fine-tuning hyperparameters.

🔹 Part 5: Evaluating the Model

Finally, I measure performance using metrics such as R², RMSE, Accuracy, and Precision—selecting the best model not only for accuracy but also for interpretability and business value.

Part 1: DATA PROCESSING

Data processing is essential to ensure that the dataset is clean, well-structured, and optimized for model building. This includes loading data, cleaning, handling missing values, engineering features, and setting up train-test splits.

1.1 Loading the Data

1.2 Data Overview

1.3 Initial Feature Engineering

1.4 Enconding categorical Data Transformation

Part 2: EXPLORATORY DATA ANALYSIS

The purpose of EDA is to gain insights into the data, identify patterns, detect anomalies, and guide feature engineering choices that may enhance model performance. Here’s a step-by-step guide for what to focus on at the EDA stage: The focus on the exploratory will be divided into two section of numerical and categorical data.

2.1 EDA Focused on Numerical Data

2.2 EDA Focused on Categorical Data

2.3 Target Variable Analysis

2.4 Feature Selection Techniques in Machine Learning

Delete later 2.5 Features Selection

Part 3: SETTING UP THE VALIDATION FRAMEWORK

Validation is a crucial step in machine learning that ensures the model performs well on unseen data. A structured validation framework prevents overfitting and helps tune the model effectively by splitting the dataset into training, validation, and test sets.

Another key preprocessing step before training is feature scaling, which standardizes numerical features for better model performance. However, scaling must be applied correctly to avoid data leakage.

💡

Note that the data have to be split before applying features scaling

3.1 Method 1: Splitting the Data

3.1 Method 2 : Splitting the Data

3.2 Feature Scaling in Machine Learning

3.2 Cross Validation Model

Part 4: BUILDING AND TRAINING THE MODEL

This section focuses on selecting, initializing, training, and validating the model with an appropriate validation technique.

4.1 Choosing a Model

4.2 Initializing the Model

4.3 Training the Model

4.4 Hyperparameter Tuning

4.5 Making Predictions (Inference)

4.6 Saving and Loading Trained Models

4.7 Deployment and Inference

4.8 Comparing (y_pred vs y_train) and  (y_pred vs y_test)

Part 5: EVALUATING THE MODEL

Evaluation metrics are essential for assessing the performance of machine learning models. Each metric provides a unique perspective on the model’s accuracy, reliability, and generalizability. Here’s a breakdown:

1. Regression Evaluation Metrics

2. Classification Evaluation Metrics

3. Clustering Evaluation Metrics

4. Time-Series Evaluation Metrics

💡

Normalization: To scale data values to a specific range, typically [0, 1]. This ensures that all features contribute equally to the model and prevents dominance by larger-scale features.

Method of Transforming Skewed Distribution Method of Transforming Skewed Distribution Transforming Skewed DistributionsTransforming Skewed Distributions
💡

Transforming skewed distributions with logarithmic or other transformations to improve model interpretability and performance.

Books

  • Data science from Scratch
Comprehensive Guide: Saving a Model Using pickle in Python