Main Concept

Binary classification models predict one of two possible outcomes (yes/no, fraud/not fraud, spam/not spam). Evaluation metrics measure how well the model performs — but different metrics capture different aspects of performance, and choosing the wrong one can be misleading.

Why Multiple Metrics Exist

A model that always predicts “no fraud” on a dataset with 99% legitimate transactions achieves 99% accuracy — but detects zero fraud cases. This is why accuracy alone is insufficient for imbalanced datasets, and why Precision, Recall, F1, and AUC-ROC exist.

The Four Prediction Outcomes

Every binary classification prediction falls into one of four buckets:

Predicted YESPredicted NO
Actually YESTrue Positive (TP)False Negative (FN)
Actually NOFalse Positive (FP)True Negative (TN)
  • False Positive → model said “yes” but it was “no” (false alarm)
  • False Negative → model said “no” but it was “yes” (missed detection)

The Metrics

Accuracy

  • Percentage of total predictions that were correct.
  • Rarely used in practice — misleading on imbalanced datasets.

Precision

  • Of all cases the model flagged as positive, how many were actually positive?
  • “When the model says YES, how often is it right?”
  • Relevant when false positives are costly.

Recall

  • Of all actual positive cases, how many did the model catch?
  • “Of all the real YESes, how many did the model find?”
  • Relevant when false negatives are costly.

F1 Score

  • The balanced combination of Precision and Recall.
  • Best for imbalanced datasets where both error types matter.
  • Precision and Recall are a seesaw — F1 finds the sweet spot.

AUC-ROC

  • Measures how well the model separates the two classes across all decision thresholds. Value: 0.5 (useless) to 1.0 (perfect).
  • Best for comparing models objectively regardless of threshold.

When to Use Which

ScenarioMetric
Quick general check (balanced data)Accuracy
False positives are costlyPrecision
False negatives are costlyRecall
Both error types matter, imbalanced dataF1 Score
Comparing models objectivelyAUC-ROC

Exam Domain

  • Domain 1, Task Statement 1.3: “Understand model performance metrics
    (for example, accuracy, AUC, F1 score).”