Main Concept

F1 Score is a metric that balances the cost of two types of errors in binary classification: false positives (false alarms) and false negatives (missed detections). It is the preferred metric when the dataset is imbalanced or when both types of errors carry significant consequences.

Intuition

F1 question: Is the model capable of detecting positive cases WITHOUT triggering false alarms too often?

  • High F1 β†’ model finds most real positives and rarely flags false ones
  • Low F1 β†’ model misses real positives OR raises too many false alarms

When to Use

  • Imbalanced datasets (fraud, disease, spam)
  • When false negatives are costly (missing a real fraud case)
  • When false positives are costly (flagging legitimate transactions)
  • Any scenario where accuracy would be misleading

Exam Example

Fraud detection: missing a real fraud (false negative) is costly.
A model optimized only for accuracy might ignore rare fraud cases.
F1 forces the model to balance detection rate against false alarms.

Exam Domain

  • Domain 1, Task Statement 1.3: model performance metrics.

References