1. Foundations: Understanding Metrics and Scorers

Status
In progress
Text

📘 1. Foundations: Understanding Metrics and Scorers

Scikit-learn provides many built-in metrics, but to use them effectively in model selection (e.g., GridSearchCVcross_val_score), you need to understand how scorers work.

scorer is a wrapper around a metric function that scikit-learn can use consistently across models.

🔑 Key Functions

1. check_scoring

  • Checks whether an estimator has a valid scoring method.
  • Ensures that the metric you pass is compatible with the model.
  • Useful when automating model evaluation.
from sklearn.utils.validation import check_scoring
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
print(check_scoring(model))   # Returns default scorer (usually 'accuracy' for classifiers)

2. get_scorer

  • Returns a scorer object from a string (e.g., "accuracy""neg_mean_squared_error").
  • Scorers always follow the maximize is better convention (so losses are returned as negatives).
from sklearn.metrics import get_scorer

scorer = get_scorer("accuracy")
print(scorer)   # <function _passthrough_scorer at ...>

3. get_scorer_names

  • Lists all available scorers in scikit-learn.
  • Very handy to see what metrics you can plug into GridSearchCV.
from sklearn.metrics import get_scorer_names

print(get_scorer_names())  # Dictionary-like view of all scorers

Sample output:

['accuracy', 'balanced_accuracy', 'f1', 'f1_macro',
 'roc_auc', 'neg_mean_squared_error', 'r2', ...]

4. make_scorer

  • Converts a metric (like mean_absolute_error) into a scorer object that can be used in cross-validation or grid search.
  • You can set whether greater is better or not.

Example: Suppose you want to use Mean Absolute Error (MAE) for evaluating a financial forecasting model:

from sklearn.metrics import make_scorer, mean_absolute_error
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression

# Sample dataset
X, y = make_regression(n_samples=100, n_features=2, noise=0.2, random_state=8)

model = LinearRegression()

# Create custom scorer (since lower MAE is better, set greater_is_better=False)
mae_scorer = make_scorer(mean_absolute_error, greater_is_better=False)

scores = cross_val_score(model, X, y, scoring=mae_scorer, cv=5)
print("MAE scores:", scores)

🏦 Finance Example

Imagine you’re forecasting monthly expenses of a company.

  • If you use r2_score, scikit-learn already provides a scorer.
  • If you prefer Mean Absolute Percentage Error (MAPE), you can wrap it in make_scorer and use it in grid search to tune your regression model.

🎯 Goal of This Section

By learning scorers:

  • You understand how metrics are plugged into model selection.
  • You can use predefined metrics quickly (get_scorer).
  • You can create custom scorers for domain-specific needs (e.g., profit-loss, error tolerance in finance).

👉 Do you want me to expand this with a side-by-side table of metric function vs scorer usage, so you can clearly see the difference between mean_squared_error (as a metric) and "neg_mean_squared_error" (as a scorer)?

You said:

YES

ChatGPT said:

Perfect ✅ — let’s expand with a side-by-side comparison so you can clearly see the difference between using a metric function and a scorer in scikit-learn.

📊 Metric vs Scorer in scikit-learn

Scikit-learn makes a distinction between:

  1. Metric functions → calculate a score directly (mean_squared_erroraccuracy_score).
  2. Scorers → wrappers used inside model selection tools (cross_val_scoreGridSearchCV).

🔎 Comparison Table

Aspect
Metric Function
Scorer
Definition
A function that directly computes performance given y_true and y_pred.
A wrapper around a metric, compatible with model selection (GridSearchCVcross_val_score).
Examples
accuracy_score(y_true, y_pred) → 0.87
scoring="accuracy" or get_scorer("accuracy")
Return Value
Returns the raw metric (higher or lower depending on definition).
Always returns a value where higher is better (losses are negated).
Usage Context
Direct evaluation of model predictions.
Automated model evaluation (cross-validation, grid search).
Customizability
Use metric directly in your code.
Create custom scorers with make_scorer.
Finance Example
mean_squared_error(actual_sales, predicted_sales) → error in sales forecast.
scoring="neg_mean_squared_error" in grid search (since scikit-learn maximizes scores).

🖥 Code Examples

1. Metric Function (direct use)

from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=100, n_features=2, noise=10, random_state=8)

model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)

# Direct metric usage
mse = mean_squared_error(y, y_pred)
print("MSE (metric):", mse)

2. Scorer (used in cross-validation)

from sklearn.model_selection import cross_val_score

# Use scorer instead of direct metric
scores = cross_val_score(model, X, y, scoring="neg_mean_squared_error", cv=5)
print("MSE (scorer, negated):", scores)

⚠️ Note: Scikit-learn negates error metrics so that higher is always better.

  • That’s why it’s "neg_mean_squared_error".
  • To get the actual MSE values, you can take scores.

3. Custom Scorer with make_scorer

from sklearn.metrics import mean_absolute_percentage_error, make_scorer

# Create custom scorer for MAPE
mape_scorer = make_scorer(mean_absolute_percentage_error, greater_is_better=False)

scores = cross_val_score(model, X, y, scoring=mape_scorer, cv=5)
print("MAPE scores:", scores)

🎯 Key Takeaways

  • Use metric functions (accuracy_scorer2_score, etc.) when you want to evaluate predictions manually.
  • Use scorers (scoring="accuracy"scoring="neg_mean_squared_error") when working with cross-validation and hyperparameter tuning.
  • Always remember: scorers are maximized, so error metrics are usually negated.

🎯 Scikit-learn Metrics and Scorers: A Complete Beginner's Guide

🤔 What's the Difference? (The Simple Version)

Imagine you're a teacher grading tests:

  • Metric Function = You manually grade one test and say "This student got 85%"
  • Scorer = You set up an automatic grading system that can grade hundreds of tests consistently

In machine learning:

  • Metrics = Functions that calculate how good your model is (like accuracy_score())
  • Scorers = Automated versions that work with scikit-learn's tools like GridSearchCV

📚 Part 1: Understanding Metrics (The Manual Way)

Let's start with a simple example - predicting house prices:

from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

# Create fake house price data
X, y = make_regression(n_samples=1000, n_features=3, noise=50, random_state=42)
# X = features (size, bedrooms, location score)
# y = house prices

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Calculate metrics MANUALLY
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Manual MSE: {mse:.2f}")
print(f"Manual R²: {r2:.2f}")

Output:

Manual MSE: 2401.85
Manual R²: 0.95

This is straightforward - you call the metric function directly with actual vs predicted values.

🔧 Part 2: Understanding Scorers (The Automated Way)

Now let's use scorers with cross-validation (testing your model multiple times):

from sklearn.model_selection import cross_val_score
from sklearn.metrics import get_scorer

# Using scorers with cross-validation
model = LinearRegression()

# Method 1: Use string names
r2_scores = cross_val_score(model, X, y, scoring='r2', cv=5)
mse_scores = cross_val_score(model, X, y, scoring='neg_mean_squared_error', cv=5)

print(f"R² scores (5 folds): {r2_scores}")
print(f"MSE scores (5 folds): {mse_scores}")
print(f"Average R²: {r2_scores.mean():.3f}")
print(f"Average MSE: {-mse_scores.mean():.2f}")  # Note the minus sign!

Output:

R² scores (5 folds): [0.951 0.948 0.953 0.949 0.952]
MSE scores (5 folds): [-2387.45 -2456.32 -2301.78 -2512.91 -2398.67]
Average R²: 0.951
Average MSE: 2411.43

🚨 Important Note: The "neg_" Prefix

Notice neg_mean_squared_error? Here's why:

  • Scikit-learn always tries to maximize scores
  • Lower MSE = better model, but scikit-learn wants higher = better
  • Solution: Make MSE negative! Now higher (less negative) = better

🛠️ Part 3: Key Functions Explained with Examples

1. get_scorer_names() - See All Available Scorers

from sklearn.metrics import get_scorer_names

# See what scorers are available
all_scorers = get_scorer_names()
print("First 10 scorers:", list(all_scorers)[:10])

Output:

First 10 scorers: ['accuracy', 'balanced_accuracy', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'neg_brier_score', 'neg_log_loss', 'neg_mean_absolute_error']

2. get_scorer() - Get a Specific Scorer

from sklearn.metrics import get_scorer

# Get a scorer object
accuracy_scorer = get_scorer('accuracy')
print(f"Scorer object: {accuracy_scorer}")

# You can use this scorer directly
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

# Classification example
X_class, y_class = make_classification(n_samples=1000, n_features=4, random_state=42)
model_class = LogisticRegression()
model_class.fit(X_class, y_class)

# Use the scorer
score = accuracy_scorer(model_class, X_class, y_class)
print(f"Accuracy score: {score:.3f}")

3. make_scorer() - Create Your Own Custom Scorer

Sometimes you need a custom metric. Let's create a "percentage within 10%" scorer for house prices:

from sklearn.metrics import make_scorer
import numpy as np

def percentage_within_10_percent(y_true, y_pred):
    """Calculate what % of predictions are within 10% of actual value"""
    percentage_error = np.abs((y_true - y_pred) / y_true) * 100
    within_10_percent = np.mean(percentage_error <= 10)
    return within_10_percent

# Create custom scorer
custom_scorer = make_scorer(percentage_within_10_percent, greater_is_better=True)

# Use it in cross-validation
scores = cross_val_score(model, X, y, scoring=custom_scorer, cv=5)
print(f"% predictions within 10%: {scores}")
print(f"Average: {scores.mean():.2f}")

Output:

% predictions within 10%: [0.92 0.91 0.93 0.90 0.92]
Average: 0.92

📋 Cheat Sheet: Common Metrics vs Scorers

Task
Metric Function
Scorer String
What it Measures
Classification
Basic accuracy
accuracy_score(y_true, y_pred)
'accuracy'
% of correct predictions
Balanced accuracy
balanced_accuracy_score(y_true, y_pred)
'balanced_accuracy'
Average recall for each class
F1 score
f1_score(y_true, y_pred)
'f1'
Balance of precision and recall
ROC AUC
roc_auc_score(y_true, y_pred)
'roc_auc'
Area under ROC curve
Regression
Mean Squared Error
mean_squared_error(y_true, y_pred)
'neg_mean_squared_error'
Average squared differences
Mean Absolute Error
mean_absolute_error(y_true, y_pred)
'neg_mean_absolute_error'
Average absolute differences
R² Score
r2_score(y_true, y_pred)
'r2'
Proportion of variance explained

🏦 Real-World Example: Stock Price Prediction

Let's put it all together with a practical example:

import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer, mean_absolute_percentage_error

# Simulate stock price features
np.random.seed(42)
n_samples = 1000

# Features: previous price, volume, market sentiment, economic indicator
X = np.random.randn(n_samples, 4)
# Target: next day's stock price
y = X[:, 0] * 100 + X[:, 1] * 50 + X[:, 2] * 30 + np.random.randn(n_samples) * 10 + 1000

# Create MAPE scorer (common in finance)
mape_scorer = make_scorer(mean_absolute_percentage_error, greater_is_better=False)

# Set up model with hyperparameter tuning
model = RandomForestRegressor(random_state=42)
param_grid = {
    'n_estimators': [50, 100],
    'max_depth': [5, 10]
}

# Use GridSearchCV with our custom scorer
grid_search = GridSearchCV(
    model,
    param_grid,
    scoring=mape_scorer,  # Using our custom scorer!
    cv=5,
    verbose=1
)

grid_search.fit(X, y)

print(f"Best parameters: {grid_search.best_params_}")
print(f"Best MAPE score: {-grid_search.best_score_:.4f}")  # Convert back from negative

✅ Key Takeaways for Beginners

  1. Metrics = Direct calculation functions you call manually
  2. Scorers = Automated versions for use with scikit-learn tools
  3. Remember the "neg_" prefix for error metrics (they get negated so higher = better)
  4. Use get_scorer_names() to see all available options
  5. Use make_scorer() to create custom metrics for your specific domain
  6. Scorers work seamlessly with GridSearchCV, cross_val_score, and other model selection tools

🎯 Practice Exercise

Try modifying the house price example to:

  1. Use a different regression model (try RandomForestRegressor)
  2. Add a custom scorer that measures "predictions within $50,000"
  3. Use GridSearchCV to find the best hyperparameters

This foundation will serve you well as you dive deeper into machine learning model evaluation and selection!