Main Concept
Semi-Supervised Learning sits between Supervised Learning and Unsupervised Learning. It uses a small amount of labeled data combined with a large amount of unlabeled data to train a model.
This is very realistic in practice β labeling data at scale is expensive and time-consuming, but completely unlabeled datasets are abundant.
How It Works
- Train an initial model on the small set of labeled data (partial labels)
- Use that partially trained model to label the unlabeled data β this is called pseudo-labeling
- Re-train the model on the full dataset (original labels + pseudo-labels) without being explicitly programmed
- The final model now behaves like a fully supervised model
Partial Labels (Banana, Orange)
+
Large Unlabeled Dataset
β
Model trains
β
Pseudo-labeling (model labels unlabeled data)
β
Re-train on full labeled dataset
β
"It's an Apple!"

Key Concept β Pseudo-Labeling
The model uses what it learned from labeled examples to assign labels to unlabeled data. Those generated labels are called pseudo-labels because they were not assigned by a human β they are the modelβs best guess, used as if they were real labels for the next training round.
Why It Matters
- Labeling thousands or millions of data points manually is impractical and costly
- Semi-supervised learning allows you to get most of the benefit of supervised learning while only labeling a small fraction of the data
- Common in real-world applications where labeled data is scarce
Use Cases
- Image classification β label a few hundred images, let the model pseudo-label thousands more
- Medical diagnosis β limited labeled patient records combined with large unlabeled datasets
- NLP β a small set of labeled documents used to label a large corpus
Related Concepts
- Supervised Learning
- Unsupervised Learning
- Anomaly Detection β confirmed anomalies become labeled data, feeding back into supervised or semi-supervised models
- Machine Learning (ML)
Exam Domain (AIF-C01)
Domain 1 β Fundamentals of AI and ML
- Task Statement 1.1: Basic AI/ML concepts β semi-supervised learning as a hybrid approach between supervised and unsupervised learning.
Links
References