📂

API Reference: Scaling and Standardization

Status

Not started

What is Feature Scaling?

Feature scaling is the process of normalizing or standardizing the range of independent variables (features) in your dataset. This is done to ensure that all features contribute equally to the model's learning process, especially when they are on different scales.

Why is Feature Scaling Important?

Improves Convergence Speed: Algorithms that use gradient descent (e.g., linear regression, neural networks) converge faster when features are scaled.
Prevents Dominance of Features: Features with larger scales can dominate the model's learning process, leading to biased results.
Enhances Performance: Some algorithms (e.g., k-Nearest Neighbors, Support Vector Machines) are sensitive to the scale of the data, and scaling can improve their performance.
Ensures Fair Comparisons: Distance-based algorithms rely on the magnitude of features, so scaling ensures fair comparisons.

When to Apply Feature Scaling

The need for feature scaling depends on the type of model you're using:

Models That Require Feature Scaling

Gradient Descent-Based Models: Linear Regression, Logistic Regression, Neural Networks.
Distance-Based Models: k-Nearest Neighbors (k-NN), Support Vector Machines (SVM), k-Means Clustering.
Regularized Models: Ridge Regression, Lasso Regression (to ensure regularization is applied uniformly).
Principal Component Analysis (PCA): Scaling ensures that principal components are not dominated by features with larger variances.

Models That Do Not Require Feature Scaling

Tree-Based Models: Decision Trees, Random Forests, Gradient Boosting Machines (e.g., XGBoost, LightGBM, CatBoost). These models are not sensitive to the scale of the data because they split data based on feature thresholds.
Naive Bayes: Assumes independence between features and is not affected by feature scales.
Rule-Based Models: Models like RuleFit are also scale-invariant.

How to Choose the Right Scaling Method

Standardization: Use when data is normally distributed or when using algorithms that assume normality.
Min-Max Scaling: Use when data is not normally distributed or when you need bounded input.
Robust Scaling: Use when data contains outliers.
Normalization: Use for text data or when using distance metrics like cosine similarity.
Log Transformation: Use for highly skewed data.

Real-World Applications in Finance

Portfolio Optimization: Normalize stock prices or returns before calculating optimal weights.
Credit Scoring: Scale customer financial metrics (income, debt ratio) for ML models.
Risk Analysis: Standardize risk metrics (beta, volatility) for consistent comparisons.
Fraud Detection: Use robust scaling to account for outliers in transaction amounts.

Types of Feature Scaling Methods

	Class Name	Purpose
Standardization (Z-score Normalization)	`StandardScaler`	Standardizes features by removing the mean and scaling to unit variance.
Min-Max Normalization	`MinMaxScaler`	Scales features to a specified range (default: [0, 1]).
Max Absolute Scaling	`MaxAbsScaler`	Scales features by their maximum absolute value.
Robust Scaling	`RobustScaler`	Scales features using statistics that are robust to outliers.
L2 (Unit Norm) Scaling	`Normalizer`	Normalizes samples individually to unit norm.
QuantileTransformer	`QuantileTransformer`	Transforms features to follow a uniform or normal distribution.
PowerTransformer	`PowerTransformer`	Applies a power transformation to make data more Gaussian-like.

Practical Steps to Apply Feature Scaling

Split Your Data: Always split your data into training and test sets before applying scaling to avoid data leakage.
Fit the Scaler on Training Data: Use the training data to compute scaling parameters (e.g., mean, standard deviation, min, max).
Transform Both Training and Test Data: Apply the computed parameters to scale both the training and test sets.
Inverse Transform (if needed): For interpretation, you can inverse transform the scaled data back to its original scale.

Example in Python (using Scikit-Learn):

from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize scaler
scaler = StandardScaler()  # or MinMaxScaler()

# Fit and transform training data
X_train_scaled = scaler.fit_transform(X_train)

# Transform test data
X_test_scaled = scaler.transform(X_test)

💡

Take note that X_train was scale and fit first by using the scaler.fit_transform(X_train), and this is to extract and compute the scaling parameters (e.g., mean, standard deviation, min, max), which is thereafter use to scale the X_test data by using the scaler.transform(X_test). key note is that the X_train is both scale and fit, and while X_test is only scale.

Summary:

X_train: Both fitted and transformed using scaler.fit_transform(X_train).
X_test: Only transformed using scaler.transform(X_test).

This approach ensures that the scaling process is robust, avoids data leakage, and maintains consistency between the training and test datasets.

Relationship between Encoding data and Features Scaling

When data is encoded using different methods, not all encoded features need to be scaled since some encoding methods (like one-hot encoding) already produce binary values of 0 and 1. Feature scaling is essential for machine learning models that are sensitive to feature magnitudes, but the need for scaling varies depending on the encoding technique used. Here's a guide to help determine when feature scaling is necessary for different encoding methods.

Encoding Method	Feature Scaling Needed?	Reason
1. Label Encoding	✅ Yes	Produces ordinal values (0, 1, 2…), which can create an artificial magnitude. Scaling ensures they don’t dominate numerical features.
2. One-Hot Encoding	❌ No	Outputs binary values (0, 1), which do not require scaling.
3. Binary Encoding	✅ Yes	Converts categories into multiple binary columns, which can lead to uneven value distributions. Scaling helps normalize them.
4. Frequency Encoding	✅ Yes	Produces numerical values based on category frequency, which may have large differences. Scaling is recommended.
5. Target Encoding (Mean Encoding)	✅ Yes	Converts categories into their mean target value, making them numerical. Scaling helps if values vary widely.
6. K-Fold Target Encoding	✅ Yes	Similar to target encoding, it creates numerical values that can be on different scales.
7. Leave-One-Out Encoding	✅ Yes	Another variation of target encoding that results in numerical values, requiring scaling.
8. Hash Encoding	✅ Yes	Hashing generates multiple numerical features, which can have large variations. Scaling ensures stability.
9. Dummy Encoding (One-Hot Encoding with Drop)	❌ No	Functions the same as one-hot encoding, with only one category dropped. Scaling is not needed.
10. WoE (Weight of Evidence) Encoding	✅ Yes	Produces numerical values (logarithmic odds), which may vary significantly and benefit from scaling.
11. Contrast Coding (e.g., Helmert, Deviation, Difference)	✅ Yes	Creates numerical transformations that can have large differences in scale. Scaling ensures balance.

EXAMPLE

import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split

# Sample dataset with categorical and numerical features
df = pd.DataFrame({
    'Category': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A'],
    'Region': ['North', 'South', 'East', 'West', 'North', 'South', 'East', 'West', 'North', 'South'],
    'Value': [1000, 2000, 3000, 1500, 2500, 3500, 1800, 2200, 2700, 3200],
    'Revenue': [50000, 60000, 55000, 52000, 59000, 62000, 51000, 58000, 61000, 63000],
    'Sales': [200, 250, 300, 220, 280, 310, 210, 270, 290, 320]
})

# One-hot encoding categorical features
df_encoded = pd.get_dummies(df, columns=['Category', 'Region'], dtype=int)

# Define feature matrix (X) and target variable (y)
X = df_encoded.drop(columns=['Revenue'])  # Assume 'Revenue' is the target variable
y = df_encoded['Revenue']

# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize scalers
standard_scaler = StandardScaler()
minmax_scaler = MinMaxScaler()

# Apply Standard Scaling to numerical features
X_train_scaled_standard = X_train.copy()
X_test_scaled_standard = X_test.copy()
X_train_scaled_standard[['Value', 'Sales']] = standard_scaler.fit_transform(X_train[['Value', 'Sales']])
X_test_scaled_standard[['Value', 'Sales']] = standard_scaler.transform(X_test[['Value', 'Sales']])

# Apply MinMax Scaling to numerical features
X_train_scaled_minmax = X_train.copy()
X_test_scaled_minmax = X_test.copy()
X_train_scaled_minmax[['Value', 'Sales']] = minmax_scaler.fit_transform(X_train[['Value', 'Sales']])
X_test_scaled_minmax[['Value', 'Sales']] = minmax_scaler.transform(X_test[['Value', 'Sales']])

# Display scaled datasets
print("Standard Scaled Training Data:\n", X_train_scaled_standard)
print("\nMinMax Scaled Training Data:\n", X_train_scaled_minmax)

Code Template

‣

~~Min-Max Normalization~~

‣

~~Max Absolute Scaling~~

‣

~~L2 (Unit Norm) Scaling~~

‣

~~Standardization (Z-score Normalization)~~

‣

~~Robust Scaling~~

‣

Decimal Scaling

‣

QuantileTransformer

‣

PowerTransformer

‣

PRACTICAL EXAMPLE OF EACH SCALE

‣