Main Concept
Unsupervised Learning is a machine learning approach where the model trains on unlabeled data β there are no correct answers provided. The goal is to let the model discover patterns, structures, or relationships inherent in the data on its own.
The machine groups and organizes the data by itself, but humans still assign meaning to the output. For example, a model might cluster thousands of images of cats together without knowing what a cat is β a human then labels that cluster βcats.β
Contrast with Supervised Learning, where labeled input-output pairs teach the model what to predict.
How It Works
- Raw, unlabeled data is fed to the model
- The model identifies structure β similarities, groupings, or anomalies β without guidance
- Humans interpret and label the discovered groups or patterns
Core Techniques
| Technique | Purpose | Example |
|---|---|---|
| Clustering | Group similar data points together | Customer segmentation by purchase behavior |
| Association Rule Learning | Find relationships between variables | βCustomers who buy X also buy Yβ |
| Anomaly Detection | Identify data points that donβt fit the pattern | Detect fraudulent transactions |
Example
Plot customer data on two axes (e.g., income vs. spending score). Without any labels, the data naturally separates into visible clusters. Unsupervised learning algorithms like K-Means find those clusters automatically.

After clustering, each group can be treated as a distinct segment:

Those clusters can then be used for:
- Targeted marketing β different campaigns for each segment
- Recommendation systems β suggest products popular within a segment
Key Aspects
- Feature Engineering can significantly improve the quality of what the model discovers β better-represented data leads to more meaningful clusters or patterns.
- Works well when you have large amounts of data but labeling it manually is impractical or expensive.
- Output requires human interpretation β the model finds structure, but meaning comes from domain expertise.
AWS Services
Amazon SageMaker includes built-in algorithms for unsupervised tasks:
- K-Means β clustering
- Random Cut Forest β anomaly detection
- LDA (Latent Dirichlet Allocation) β topic modeling in text
Related Concepts
- Supervised Learning
- Machine Learning (ML)
- Deep Learning (DL)
- Feature Engineering
- Data Types and Formats in AI
- Amazon SageMaker Overview
Exam Domain (AIF-C01)
Domain 1 β Fundamentals of AI and ML
- Task Statement 1.1: Explain basic AI concepts β types of ML (supervised, unsupervised, reinforcement learning).
- Task Statement 1.3: Describe the ML development lifecycle β understanding when to apply unsupervised approaches.
Links
References