📘 1. Foundations: Understanding Metrics and Scorers
Scikit-learn provides many built-in metrics, but to use them effectively in model selection (e.g., GridSearchCV
, cross_val_score
), you need to understand how scorers work.
A scorer is a wrapper around a metric function that scikit-learn can use consistently across models.
🔑 Key Functions
1. check_scoring
- Checks whether an estimator has a valid scoring method.
- Ensures that the metric you pass is compatible with the model.
- Useful when automating model evaluation.
from sklearn.utils.validation import check_scoring
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
print(check_scoring(model)) # Returns default scorer (usually 'accuracy' for classifiers)
2. get_scorer
- Returns a scorer object from a string (e.g.,
"accuracy"
,"neg_mean_squared_error"
). - Scorers always follow the maximize is better convention (so losses are returned as negatives).
from sklearn.metrics import get_scorer
scorer = get_scorer("accuracy")
print(scorer) # <function _passthrough_scorer at ...>
3. get_scorer_names
- Lists all available scorers in scikit-learn.
- Very handy to see what metrics you can plug into
GridSearchCV
.
from sklearn.metrics import get_scorer_names
print(get_scorer_names()) # Dictionary-like view of all scorers
Sample output:
['accuracy', 'balanced_accuracy', 'f1', 'f1_macro',
'roc_auc', 'neg_mean_squared_error', 'r2', ...]
4. make_scorer
- Converts a metric (like
mean_absolute_error
) into a scorer object that can be used in cross-validation or grid search. - You can set whether greater is better or not.
Example: Suppose you want to use Mean Absolute Error (MAE) for evaluating a financial forecasting model:
from sklearn.metrics import make_scorer, mean_absolute_error
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
# Sample dataset
X, y = make_regression(n_samples=100, n_features=2, noise=0.2, random_state=8)
model = LinearRegression()
# Create custom scorer (since lower MAE is better, set greater_is_better=False)
mae_scorer = make_scorer(mean_absolute_error, greater_is_better=False)
scores = cross_val_score(model, X, y, scoring=mae_scorer, cv=5)
print("MAE scores:", scores)
🏦 Finance Example
Imagine you’re forecasting monthly expenses of a company.
- If you use
r2_score
, scikit-learn already provides a scorer. - If you prefer Mean Absolute Percentage Error (MAPE), you can wrap it in
make_scorer
and use it in grid search to tune your regression model.
🎯 Goal of This Section
By learning scorers:
- You understand how metrics are plugged into model selection.
- You can use predefined metrics quickly (
get_scorer
). - You can create custom scorers for domain-specific needs (e.g., profit-loss, error tolerance in finance).
👉 Do you want me to expand this with a side-by-side table of metric function vs scorer usage, so you can clearly see the difference between mean_squared_error
(as a metric) and "neg_mean_squared_error"
(as a scorer)?
You said:
YES
ChatGPT said:
Perfect ✅ — let’s expand with a side-by-side comparison so you can clearly see the difference between using a metric function and a scorer in scikit-learn.
📊 Metric vs Scorer in scikit-learn
Scikit-learn makes a distinction between:
- Metric functions → calculate a score directly (
mean_squared_error
,accuracy_score
). - Scorers → wrappers used inside model selection tools (
cross_val_score
,GridSearchCV
).
🔎 Comparison Table
Aspect | Metric Function | Scorer |
Definition | A function that directly computes performance given y_true and y_pred . | A wrapper around a metric, compatible with model selection ( GridSearchCV , cross_val_score ). |
Examples | accuracy_score(y_true, y_pred) → 0.87 | scoring="accuracy" or get_scorer("accuracy") |
Return Value | Returns the raw metric (higher or lower depending on definition). | Always returns a value where higher is better (losses are negated). |
Usage Context | Direct evaluation of model predictions. | Automated model evaluation (cross-validation, grid search). |
Customizability | Use metric directly in your code. | Create custom scorers with make_scorer . |
Finance Example | mean_squared_error(actual_sales, predicted_sales) → error in sales forecast. | scoring="neg_mean_squared_error" in grid search (since scikit-learn maximizes scores). |
🖥 Code Examples
1. Metric Function (direct use)
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=2, noise=10, random_state=8)
model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)
# Direct metric usage
mse = mean_squared_error(y, y_pred)
print("MSE (metric):", mse)
2. Scorer (used in cross-validation)
from sklearn.model_selection import cross_val_score
# Use scorer instead of direct metric
scores = cross_val_score(model, X, y, scoring="neg_mean_squared_error", cv=5)
print("MSE (scorer, negated):", scores)
⚠️ Note: Scikit-learn negates error metrics so that higher is always better.
- That’s why it’s
"neg_mean_squared_error"
. - To get the actual MSE values, you can take
scores
.
3. Custom Scorer with make_scorer
from sklearn.metrics import mean_absolute_percentage_error, make_scorer
# Create custom scorer for MAPE
mape_scorer = make_scorer(mean_absolute_percentage_error, greater_is_better=False)
scores = cross_val_score(model, X, y, scoring=mape_scorer, cv=5)
print("MAPE scores:", scores)
🎯 Key Takeaways
- Use metric functions (
accuracy_score
,r2_score
, etc.) when you want to evaluate predictions manually. - Use scorers (
scoring="accuracy"
,scoring="neg_mean_squared_error"
) when working with cross-validation and hyperparameter tuning. - Always remember: scorers are maximized, so error metrics are usually negated.
🎯 Scikit-learn Metrics and Scorers: A Complete Beginner's Guide
🤔 What's the Difference? (The Simple Version)
Imagine you're a teacher grading tests:
- Metric Function = You manually grade one test and say "This student got 85%"
- Scorer = You set up an automatic grading system that can grade hundreds of tests consistently
In machine learning:
- Metrics = Functions that calculate how good your model is (like
accuracy_score()
) - Scorers = Automated versions that work with scikit-learn's tools like GridSearchCV
📚 Part 1: Understanding Metrics (The Manual Way)
Let's start with a simple example - predicting house prices:
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
# Create fake house price data
X, y = make_regression(n_samples=1000, n_features=3, noise=50, random_state=42)
# X = features (size, bedrooms, location score)
# y = house prices
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Calculate metrics MANUALLY
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Manual MSE: {mse:.2f}")
print(f"Manual R²: {r2:.2f}")
Output:
Manual MSE: 2401.85
Manual R²: 0.95
This is straightforward - you call the metric function directly with actual vs predicted values.
🔧 Part 2: Understanding Scorers (The Automated Way)
Now let's use scorers with cross-validation (testing your model multiple times):
from sklearn.model_selection import cross_val_score
from sklearn.metrics import get_scorer
# Using scorers with cross-validation
model = LinearRegression()
# Method 1: Use string names
r2_scores = cross_val_score(model, X, y, scoring='r2', cv=5)
mse_scores = cross_val_score(model, X, y, scoring='neg_mean_squared_error', cv=5)
print(f"R² scores (5 folds): {r2_scores}")
print(f"MSE scores (5 folds): {mse_scores}")
print(f"Average R²: {r2_scores.mean():.3f}")
print(f"Average MSE: {-mse_scores.mean():.2f}") # Note the minus sign!
Output:
R² scores (5 folds): [0.951 0.948 0.953 0.949 0.952]
MSE scores (5 folds): [-2387.45 -2456.32 -2301.78 -2512.91 -2398.67]
Average R²: 0.951
Average MSE: 2411.43
🚨 Important Note: The "neg_" Prefix
Notice neg_mean_squared_error
? Here's why:
- Scikit-learn always tries to maximize scores
- Lower MSE = better model, but scikit-learn wants higher = better
- Solution: Make MSE negative! Now higher (less negative) = better
🛠️ Part 3: Key Functions Explained with Examples
1. get_scorer_names()
- See All Available Scorers
from sklearn.metrics import get_scorer_names
# See what scorers are available
all_scorers = get_scorer_names()
print("First 10 scorers:", list(all_scorers)[:10])
Output:
First 10 scorers: ['accuracy', 'balanced_accuracy', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'neg_brier_score', 'neg_log_loss', 'neg_mean_absolute_error']
2. get_scorer()
- Get a Specific Scorer
from sklearn.metrics import get_scorer
# Get a scorer object
accuracy_scorer = get_scorer('accuracy')
print(f"Scorer object: {accuracy_scorer}")
# You can use this scorer directly
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
# Classification example
X_class, y_class = make_classification(n_samples=1000, n_features=4, random_state=42)
model_class = LogisticRegression()
model_class.fit(X_class, y_class)
# Use the scorer
score = accuracy_scorer(model_class, X_class, y_class)
print(f"Accuracy score: {score:.3f}")
3. make_scorer()
- Create Your Own Custom Scorer
Sometimes you need a custom metric. Let's create a "percentage within 10%" scorer for house prices:
from sklearn.metrics import make_scorer
import numpy as np
def percentage_within_10_percent(y_true, y_pred):
"""Calculate what % of predictions are within 10% of actual value"""
percentage_error = np.abs((y_true - y_pred) / y_true) * 100
within_10_percent = np.mean(percentage_error <= 10)
return within_10_percent
# Create custom scorer
custom_scorer = make_scorer(percentage_within_10_percent, greater_is_better=True)
# Use it in cross-validation
scores = cross_val_score(model, X, y, scoring=custom_scorer, cv=5)
print(f"% predictions within 10%: {scores}")
print(f"Average: {scores.mean():.2f}")
Output:
% predictions within 10%: [0.92 0.91 0.93 0.90 0.92]
Average: 0.92
📋 Cheat Sheet: Common Metrics vs Scorers
Task | Metric Function | Scorer String | What it Measures |
Classification | |||
Basic accuracy | accuracy_score(y_true, y_pred) | 'accuracy' | % of correct predictions |
Balanced accuracy | balanced_accuracy_score(y_true, y_pred) | 'balanced_accuracy' | Average recall for each class |
F1 score | f1_score(y_true, y_pred) | 'f1' | Balance of precision and recall |
ROC AUC | roc_auc_score(y_true, y_pred) | 'roc_auc' | Area under ROC curve |
Regression | |||
Mean Squared Error | mean_squared_error(y_true, y_pred) | 'neg_mean_squared_error' | Average squared differences |
Mean Absolute Error | mean_absolute_error(y_true, y_pred) | 'neg_mean_absolute_error' | Average absolute differences |
R² Score | r2_score(y_true, y_pred) | 'r2' | Proportion of variance explained |
🏦 Real-World Example: Stock Price Prediction
Let's put it all together with a practical example:
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer, mean_absolute_percentage_error
# Simulate stock price features
np.random.seed(42)
n_samples = 1000
# Features: previous price, volume, market sentiment, economic indicator
X = np.random.randn(n_samples, 4)
# Target: next day's stock price
y = X[:, 0] * 100 + X[:, 1] * 50 + X[:, 2] * 30 + np.random.randn(n_samples) * 10 + 1000
# Create MAPE scorer (common in finance)
mape_scorer = make_scorer(mean_absolute_percentage_error, greater_is_better=False)
# Set up model with hyperparameter tuning
model = RandomForestRegressor(random_state=42)
param_grid = {
'n_estimators': [50, 100],
'max_depth': [5, 10]
}
# Use GridSearchCV with our custom scorer
grid_search = GridSearchCV(
model,
param_grid,
scoring=mape_scorer, # Using our custom scorer!
cv=5,
verbose=1
)
grid_search.fit(X, y)
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best MAPE score: {-grid_search.best_score_:.4f}") # Convert back from negative
✅ Key Takeaways for Beginners
- Metrics = Direct calculation functions you call manually
- Scorers = Automated versions for use with scikit-learn tools
- Remember the "neg_" prefix for error metrics (they get negated so higher = better)
- Use
get_scorer_names()
to see all available options - Use
make_scorer()
to create custom metrics for your specific domain - Scorers work seamlessly with GridSearchCV, cross_val_score, and other model selection tools
🎯 Practice Exercise
Try modifying the house price example to:
- Use a different regression model (try
RandomForestRegressor
) - Add a custom scorer that measures "predictions within $50,000"
- Use GridSearchCV to find the best hyperparameters
This foundation will serve you well as you dive deeper into machine learning model evaluation and selection!