Main Concept

Semi-Supervised Learning sits between Supervised Learning and Unsupervised Learning. It uses a small amount of labeled data combined with a large amount of unlabeled data to train a model.

This is very realistic in practice β€” labeling data at scale is expensive and time-consuming, but completely unlabeled datasets are abundant.

How It Works

  1. Train an initial model on the small set of labeled data (partial labels)
  2. Use that partially trained model to label the unlabeled data β€” this is called pseudo-labeling
  3. Re-train the model on the full dataset (original labels + pseudo-labels) without being explicitly programmed
  4. The final model now behaves like a fully supervised model
Partial Labels (Banana, Orange)
        +
Large Unlabeled Dataset
        ↓
    Model trains
        ↓
  Pseudo-labeling (model labels unlabeled data)
        ↓
  Re-train on full labeled dataset
        ↓
  "It's an Apple!"

Key Concept β€” Pseudo-Labeling

The model uses what it learned from labeled examples to assign labels to unlabeled data. Those generated labels are called pseudo-labels because they were not assigned by a human β€” they are the model’s best guess, used as if they were real labels for the next training round.

Why It Matters

  • Labeling thousands or millions of data points manually is impractical and costly
  • Semi-supervised learning allows you to get most of the benefit of supervised learning while only labeling a small fraction of the data
  • Common in real-world applications where labeled data is scarce

Use Cases

  • Image classification β€” label a few hundred images, let the model pseudo-label thousands more
  • Medical diagnosis β€” limited labeled patient records combined with large unlabeled datasets
  • NLP β€” a small set of labeled documents used to label a large corpus

Exam Domain (AIF-C01)

Domain 1 β€” Fundamentals of AI and ML

  • Task Statement 1.1: Basic AI/ML concepts β€” semi-supervised learning as a hybrid approach between supervised and unsupervised learning.

References