Metrics for evaluating models.

FUNCTION

sklearn.metrics:

Multi-select
Status
In progress
URL

TABLE OF CONTENT

1. Foundations: Understanding Metrics and Scorers

Before diving into individual metrics, get familiar with how scikit-learn handles scoring.

  • Topics to Cover:
    • check_scoring → determine if an estimator has a scoring method.
    • get_scorerget_scorer_names → retrieving predefined scorers.
    • make_scorer → turning custom functions into scorers.
  • Goal: Understand how metrics connect with model selection (e.g., GridSearchCVcross_val_score).

2. Classification Metrics (Most Used in Practice)

Start with binary classification → multiclass → multilabel.

  • Core Metrics (binary/multiclass):
    • accuracy_score
    • confusion_matrixConfusionMatrixDisplay
    • precision_scorerecall_scoref1_scorefbeta_score
    • classification_report
    • roc_auc_scoreroc_curveRocCurveDisplay
    • precision_recall_curvePrecisionRecallDisplay
    • average_precision_score
  • Advanced / Less Common:
    • balanced_accuracy_score
    • brier_score_loss
    • log_loss
    • matthews_corrcoef
    • cohen_kappa_score
    • jaccard_score
    • zero_one_loss
    • det_curveDetCurveDisplay
    • top_k_accuracy_score
  • Specialized:
    • class_likelihood_ratios (diagnostic testing context).
    • precision_recall_fscore_support (per-class breakdown).
    • multilabel_confusion_matrix.
  • Goal: Be able to evaluate churn prediction, fraud detection, spam detection using the right metric for class imbalance.

3. Regression Metrics (Continuous Predictions)

  • Core Metrics:
    • mean_absolute_error (MAE)
    • mean_squared_error (MSE)
    • root_mean_squared_error (RMSE)
    • r2_score (coefficient of determination)
  • Advanced Metrics:
    • median_absolute_error
    • explained_variance_score
    • max_error
    • mean_absolute_percentage_error (MAPE)
  • Specialized Loss Functions (for GLMs, quantile regression, deviance):
    • d2_absolute_error_scored2_pinball_scored2_tweedie_score
    • mean_pinball_loss
    • mean_poisson_deviance
    • mean_gamma_deviance
    • mean_tweedie_deviance
    • mean_squared_log_errorroot_mean_squared_log_error
  • Goal: Be comfortable choosing between MAE, RMSE, R², and others depending on forecasting/business context.

4. Ranking & Information Retrieval Metrics

  • Use case: Recommendation systems, search relevance.
  • Metrics:
    • dcg_score (Discounted Cumulative Gain)
    • ndcg_score (Normalized DCG)
    • coverage_error
    • label_ranking_loss
    • label_ranking_average_precision_score
  • Goal: Learn how to measure ranking quality instead of raw predictions.

5. Clustering Metrics

  • Supervised (with ground truth labels):
    • adjusted_rand_scorerand_score
    • mutual_info_scoreadjusted_mutual_info_scorenormalized_mutual_info_score
    • homogeneity_scorecompleteness_scorev_measure_scorehomogeneity_completeness_v_measure
    • fowlkes_mallows_score
    • cluster.contingency_matrixcluster.pair_confusion_matrix
  • Unsupervised (internal validation):
    • silhouette_scoresilhouette_samples
    • calinski_harabasz_score
    • davies_bouldin_score
  • Goal: Understand how to judge quality of KMeans/DBSCAN clustering with and without ground truth.

6. Biclustering Metrics

  • Niche, but useful for gene expression data or matrix factorization.
    • consensus_score

7. Distance & Pairwise Metrics (Very Useful in ML)

  • Distances:
    • pairwise.euclidean_distances
    • pairwise.manhattan_distances
    • pairwise.cosine_distances / cosine_similarity
    • pairwise.nan_euclidean_distances
    • pairwise.haversine_distances
  • Kernels:
    • linear_kernelrbf_kernelpolynomial_kernelsigmoid_kernel
    • laplacian_kernelchi2_kerneladditive_chi2_kernel
  • Utilities:
    • pairwise_distancespairwise_distances_chunked
    • pairwise_distances_argminpairwise_distances_argmin_min
  • Goal: Learn which distance or kernel to use in KNN, SVM, clustering.

8. Visualization Tools for Metrics

  • ConfusionMatrixDisplay
  • RocCurveDisplay
  • PrecisionRecallDisplay
  • DetCurveDisplay
  • PredictionErrorDisplay
  • Goal: Practice making evaluation plots alongside raw scores.