Variance (Model Fit)

Main Concept

Variance measures how much a model’s performance changes when trained on
different datasets with similar distribution. A high variance model is
overly sensitive to the specific data it was trained on.

Key Aspects

High Variance = Overfitting

  • The model is very sensitive to changes in training data.
  • Performs great on training data but poorly on unseen test data.
  • It memorized the training data instead of learning the general pattern.

*

*The target diagram **

  • The center (orange) = the truth (the correct prediction).
  • The red dots = the model’s predictions.
  • High variance means predictions are scattered all around the center —
    the model is inconsistent, sometimes close, sometimes far. It reacts
    differently depending on which data it saw.

Contrast with high bias:

  • High bias = predictions clustered together but far from center (consistently wrong).
  • High variance = predictions scattered around the center (inconsistently wrong).

How to reduce variance

  • Feature selection — use fewer, more relevant features.
  • Split data into training and test sets multiple times (cross-validation)
    to ensure the model generalizes.

Bias vs Variance — The Core Tradeoff

This is one of the most fundamental tradeoffs in ML:


More complex model → lower bias, higher variance (overfitting risk)
Simpler model → higher bias, lower variance (underfitting risk)
Goal → find the sweet spot (balanced fit)

You can’t minimize both simultaneously — reducing one tends to increase
the other. The goal is to find the right balance.

Exam Domain

  • Domain 1, Task Statement 1.1: basic AI terms including “bias” and “fit.”
  • Domain 4, Task Statement 4.1: “understand effects of bias and variance
    (for example, effects on demographic groups, inaccuracy, overfitting,
    underfitting).”

Exam Scenarios to Recognize

ScenarioDiagnosis
Performs well in training, poorly in productionHigh variance / overfitting
Performs poorly even on training dataHigh bias / underfitting
Sensitive to small changes in training dataHigh variance
Consistently predicts the wrong valueHigh bias