📘 1. Foundations: Understanding Metrics and Scorers
Scikit-learn provides many built-in metrics, but to use them effectively in model selection (e.g., GridSearchCV, cross_val_score), you need to understand how scorers work.
A scorer is a wrapper around a metric function that scikit-learn can use consistently across models.
🔑 Key Functions
1. check_scoring
- Checks whether an estimator has a valid scoring method.
- Ensures that the metric you pass is compatible with the model.
- Useful when automating model evaluation.
from sklearn.utils.validation import check_scoring
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
print(check_scoring(model)) # Returns default scorer (usually 'accuracy' for classifiers)
2. get_scorer
- Returns a scorer object from a string (e.g.,
"accuracy","neg_mean_squared_error"). - Scorers always follow the maximize is better convention (so losses are returned as negatives).
from sklearn.metrics import get_scorer
scorer = get_scorer("accuracy")
print(scorer) # <function _passthrough_scorer at ...>
3. get_scorer_names
- Lists all available scorers in scikit-learn.
- Very handy to see what metrics you can plug into
GridSearchCV.
from sklearn.metrics import get_scorer_names
print(get_scorer_names()) # Dictionary-like view of all scorersSample output:
['accuracy', 'balanced_accuracy', 'f1', 'f1_macro',
'roc_auc', 'neg_mean_squared_error', 'r2', ...]
4. make_scorer
- Converts a metric (like
mean_absolute_error) into a scorer object that can be used in cross-validation or grid search. - You can set whether greater is better or not.
Example: Suppose you want to use Mean Absolute Error (MAE) for evaluating a financial forecasting model:
🏦 Finance Example
Imagine you’re forecasting monthly expenses of a company.
- If you use
r2_score, scikit-learn already provides a scorer. - If you prefer Mean Absolute Percentage Error (MAPE), you can wrap it in
make_scorerand use it in grid search to tune your regression model.
🎯 Goal of This Section
By learning scorers:
- You understand how metrics are plugged into model selection.
- You can use predefined metrics quickly (
get_scorer). - You can create custom scorers for domain-specific needs (e.g., profit-loss, error tolerance in finance).
👉 Do you want me to expand this with a side-by-side table of metric function vs scorer usage, so you can clearly see the difference between mean_squared_error (as a metric) and "neg_mean_squared_error" (as a scorer)?
You said:
YES
ChatGPT said:
Perfect ✅ — let’s expand with a side-by-side comparison so you can clearly see the difference between using a metric function and a scorer in scikit-learn.
📊 Metric vs Scorer in scikit-learn
Scikit-learn makes a distinction between:
- Metric functions → calculate a score directly (
mean_squared_error,accuracy_score). - Scorers → wrappers used inside model selection tools (
cross_val_score,GridSearchCV).
🔎 Comparison Table
Aspect | Metric Function | Scorer |
Definition | A function that directly computes performance given y_true and y_pred. | A wrapper around a metric, compatible with model selection ( GridSearchCV, cross_val_score). |
Examples | accuracy_score(y_true, y_pred) → 0.87 | scoring="accuracy" or get_scorer("accuracy") |
Return Value | Returns the raw metric (higher or lower depending on definition). | Always returns a value where higher is better (losses are negated). |
Usage Context | Direct evaluation of model predictions. | Automated model evaluation (cross-validation, grid search). |
Customizability | Use metric directly in your code. | Create custom scorers with make_scorer. |
Finance Example | mean_squared_error(actual_sales, predicted_sales) → error in sales forecast. | scoring="neg_mean_squared_error" in grid search (since scikit-learn maximizes scores). |
🖥 Code Examples
1. Metric Function (direct use)
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=2, noise=10, random_state=8)
model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)
# Direct metric usage
mse = mean_squared_error(y, y_pred)
print("MSE (metric):", mse)
2. Scorer (used in cross-validation)
from sklearn.model_selection import cross_val_score
# Use scorer instead of direct metric
scores = cross_val_score(model, X, y, scoring="neg_mean_squared_error", cv=5)
print("MSE (scorer, negated):", scores)
⚠️ Note: Scikit-learn negates error metrics so that higher is always better.
- That’s why it’s
"neg_mean_squared_error". - To get the actual MSE values, you can take
scores.
3. Custom Scorer with make_scorer
from sklearn.metrics import mean_absolute_percentage_error, make_scorer
# Create custom scorer for MAPE
mape_scorer = make_scorer(mean_absolute_percentage_error, greater_is_better=False)
scores = cross_val_score(model, X, y, scoring=mape_scorer, cv=5)
print("MAPE scores:", scores)
🎯 Key Takeaways
- Use metric functions (
accuracy_score,r2_score, etc.) when you want to evaluate predictions manually. - Use scorers (
scoring="accuracy",scoring="neg_mean_squared_error") when working with cross-validation and hyperparameter tuning. - Always remember: scorers are maximized, so error metrics are usually negated.
🎯 Scikit-learn Metrics and Scorers: A Complete Beginner's Guide
🤔 What's the Difference? (The Simple Version)
Imagine you're a teacher grading tests:
- Metric Function = You manually grade one test and say "This student got 85%"
- Scorer = You set up an automatic grading system that can grade hundreds of tests consistently
In machine learning:
- Metrics = Functions that calculate how good your model is (like
accuracy_score()) - Scorers = Automated versions that work with scikit-learn's tools like GridSearchCV
📚 Part 1: Understanding Metrics (The Manual Way)
Let's start with a simple example - predicting house prices:
Output:
Manual MSE: 2401.85
Manual R²: 0.95
This is straightforward - you call the metric function directly with actual vs predicted values.
🔧 Part 2: Understanding Scorers (The Automated Way)
Now let's use scorers with cross-validation (testing your model multiple times):
Output:
R² scores (5 folds): [0.951 0.948 0.953 0.949 0.952]
MSE scores (5 folds): [-2387.45 -2456.32 -2301.78 -2512.91 -2398.67]
Average R²: 0.951
Average MSE: 2411.43
🚨 Important Note: The "neg_" Prefix
Notice neg_mean_squared_error? Here's why:
- Scikit-learn always tries to maximize scores
- Lower MSE = better model, but scikit-learn wants higher = better
- Solution: Make MSE negative! Now higher (less negative) = better
🛠️ Part 3: Key Functions Explained with Examples
1. get_scorer_names() - See All Available Scorers
from sklearn.metrics import get_scorer_names
# See what scorers are available
all_scorers = get_scorer_names()
print("First 10 scorers:", list(all_scorers)[:10])
Output:
First 10 scorers: ['accuracy', 'balanced_accuracy', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'neg_brier_score', 'neg_log_loss', 'neg_mean_absolute_error']
2. get_scorer() - Get a Specific Scorer
3. make_scorer() - Create Your Own Custom Scorer
Sometimes you need a custom metric. Let's create a "percentage within 10%" scorer for house prices:
Output:
% predictions within 10%: [0.92 0.91 0.93 0.90 0.92]
Average: 0.92
📋 Cheat Sheet: Common Metrics vs Scorers
Task | Metric Function | Scorer String | What it Measures |
Classification | |||
Basic accuracy | accuracy_score(y_true, y_pred) | 'accuracy' | % of correct predictions |
Balanced accuracy | balanced_accuracy_score(y_true, y_pred) | 'balanced_accuracy' | Average recall for each class |
F1 score | f1_score(y_true, y_pred) | 'f1' | Balance of precision and recall |
ROC AUC | roc_auc_score(y_true, y_pred) | 'roc_auc' | Area under ROC curve |
Regression | |||
Mean Squared Error | mean_squared_error(y_true, y_pred) | 'neg_mean_squared_error' | Average squared differences |
Mean Absolute Error | mean_absolute_error(y_true, y_pred) | 'neg_mean_absolute_error' | Average absolute differences |
R² Score | r2_score(y_true, y_pred) | 'r2' | Proportion of variance explained |
🏦 Real-World Example: Stock Price Prediction
Let's put it all together with a practical example:
✅ Key Takeaways for Beginners
- Metrics = Direct calculation functions you call manually
- Scorers = Automated versions for use with scikit-learn tools
- Remember the "neg_" prefix for error metrics (they get negated so higher = better)
- Use
get_scorer_names()to see all available options - Use
make_scorer()to create custom metrics for your specific domain - Scorers work seamlessly with GridSearchCV, cross_val_score, and other model selection tools
🎯 Practice Exercise
Try modifying the house price example to:
- Use a different regression model (try
RandomForestRegressor) - Add a custom scorer that measures "predictions within $50,000"
- Use GridSearchCV to find the best hyperparameters
This foundation will serve you well as you dive deeper into machine learning model evaluation and selection!