These notes provide an in‑depth look at several regression methods used in machine learning. They are divided into the following sections:
- Overview of Penalized Regression Techniques
- Ridge Regression
- LASSO Regression
- Elastic Net Regression
- Comparative Notes: Ridge vs. LASSO vs. Elastic Net
- Multiple Linear Regression Fundamentals
- Additional Considerations
Overview of Penalized Regression Techniques
What’s the main idea?
In machine learning, when you have datasets with many variables (features), especially if these variables are closely related, your model can become overly complicated. This complexity can lead to a common issue known as overfitting. Overfitting happens when your model learns the training data too well—including its noise and randomness—which results in poor performance on new, unseen data.
To solve this, we use Penalized Regression. Think of penalized regression as gently guiding your model to become simpler by adding a "penalty" that discourages it from relying too heavily on too many variables.
How does Penalized Regression work?
We start from a basic concept called the Mean Squared Error (MSE), which measures how much your model's predictions differ from the actual data. Penalized regression adds an extra term (the penalty) to this MSE.
Imagine you’re shopping with a limited budget; you want good quality (low errors) without spending too much (penalty for complexity). This penalty helps balance accuracy with simplicity.
Here's how it looks in math (explained simply):
- MSE (Mean Squared Error): measures how wrong your predictions are.
- Penalty: measures how complicated your model is (the more complicated, the larger the penalty).
Together, these become your new goal for optimization:
MSE + Penalty

Why standardize features?
Before using these methods, it's important to make sure all your features are measured on the same scale (called standardization). For example, you wouldn't fairly compare weights measured in kilograms to heights measured in centimeters without standardizing them first.
Standardization ensures that the penalty impacts each feature fairly and equally, preventing the model from favoring certain features just because of their units.
Common Penalized Regression Techniques
- Ridge Regression (L2 penalty):
- Imagine gently pushing all coefficients towards zero, reducing their size but never exactly eliminating any. This helps prevent the model from relying too much on any single feature.
- Useful when you believe many features are important, but you want each to contribute moderately.
- LASSO Regression (L1 penalty):
- Imagine strongly nudging some less important coefficients completely to zero, effectively removing them from the model.
- Acts like automatic feature selection. It simplifies your model by choosing only the most critical features.
- Elastic Net Regression (Combined L1 and L2 penalties):
- A middle ground that combines Ridge and LASSO.
- It reduces some coefficients to zero (like LASSO), and gently shrinks others (like Ridge).
- Especially useful when your dataset has many correlated features, giving you both simplicity and flexibility.