Binary Classification Metrics

Main Concept

Binary classification models predict one of two possible outcomes (yes/no, fraud/not fraud, spam/not spam). Evaluation metrics measure how well the model performs — but different metrics capture different aspects of performance, and choosing the wrong one can be misleading.

Why Multiple Metrics Exist

A model that always predicts “no fraud” on a dataset with 99% legitimate transactions achieves 99% accuracy — but detects zero fraud cases. This is why accuracy alone is insufficient for imbalanced datasets, and why Precision, Recall, F1, and AUC-ROC exist.

The Four Prediction Outcomes

Every binary classification prediction falls into one of four buckets:

	Predicted YES	Predicted NO
Actually YES	True Positive (TP)	False Negative (FN)
Actually NO	False Positive (FP)	True Negative (TN)

False Positive → model said “yes” but it was “no” (false alarm)
False Negative → model said “no” but it was “yes” (missed detection)

The Metrics

Accuracy

Percentage of total predictions that were correct.
Rarely used in practice — misleading on imbalanced datasets.

Precision

Of all cases the model flagged as positive, how many were actually positive?
“When the model says YES, how often is it right?”
Relevant when false positives are costly.

Recall

Of all actual positive cases, how many did the model catch?
“Of all the real YESes, how many did the model find?”
Relevant when false negatives are costly.

F1 Score

The balanced combination of Precision and Recall.
Best for imbalanced datasets where both error types matter.
Precision and Recall are a seesaw — F1 finds the sweet spot.

AUC-ROC

Measures how well the model separates the two classes across all decision thresholds. Value: 0.5 (useless) to 1.0 (perfect).
Best for comparing models objectively regardless of threshold.

When to Use Which

Scenario	Metric
Quick general check (balanced data)	Accuracy
False positives are costly	Precision
False negatives are costly	Recall
Both error types matter, imbalanced data	F1 Score
Comparing models objectively	AUC-ROC

Exam Domain

Domain 1, Task Statement 1.3: “Understand model performance metrics
(for example, accuracy, AUC, F1 score).”

🌿💻 The Packets Garden

Explorer

Binary Classification Metrics

Main Concept

Why Multiple Metrics Exist

The Four Prediction Outcomes

The Metrics

Accuracy

Precision

Recall

F1 Score

AUC-ROC

When to Use Which

Exam Domain

Links

Graph View

Table of Contents

Backlinks

🌿💻 The Packets Garden

Explorer

Binary Classification Metrics

Main Concept

Why Multiple Metrics Exist

The Four Prediction Outcomes

The Metrics

Accuracy

Precision

Recall

F1 Score

AUC-ROC

When to Use Which

Exam Domain

Related Notes

Links

Graph View

Table of Contents

Backlinks