Main Concept

Unsupervised Learning is a machine learning approach where the model trains on unlabeled data β€” there are no correct answers provided. The goal is to let the model discover patterns, structures, or relationships inherent in the data on its own.

The machine groups and organizes the data by itself, but humans still assign meaning to the output. For example, a model might cluster thousands of images of cats together without knowing what a cat is β€” a human then labels that cluster β€œcats.”

Contrast with Supervised Learning, where labeled input-output pairs teach the model what to predict.

How It Works

  1. Raw, unlabeled data is fed to the model
  2. The model identifies structure β€” similarities, groupings, or anomalies β€” without guidance
  3. Humans interpret and label the discovered groups or patterns

Core Techniques

TechniquePurposeExample
ClusteringGroup similar data points togetherCustomer segmentation by purchase behavior
Association Rule LearningFind relationships between variables”Customers who buy X also buy Y”
Anomaly DetectionIdentify data points that don’t fit the patternDetect fraudulent transactions

Example

Plot customer data on two axes (e.g., income vs. spending score). Without any labels, the data naturally separates into visible clusters. Unsupervised learning algorithms like K-Means find those clusters automatically.

After clustering, each group can be treated as a distinct segment:

Those clusters can then be used for:

  • Targeted marketing β€” different campaigns for each segment
  • Recommendation systems β€” suggest products popular within a segment

Key Aspects

  • Feature Engineering can significantly improve the quality of what the model discovers β€” better-represented data leads to more meaningful clusters or patterns.
  • Works well when you have large amounts of data but labeling it manually is impractical or expensive.
  • Output requires human interpretation β€” the model finds structure, but meaning comes from domain expertise.

AWS Services

Amazon SageMaker includes built-in algorithms for unsupervised tasks:

  • K-Means β€” clustering
  • Random Cut Forest β€” anomaly detection
  • LDA (Latent Dirichlet Allocation) β€” topic modeling in text

Exam Domain (AIF-C01)

Domain 1 β€” Fundamentals of AI and ML

  • Task Statement 1.1: Explain basic AI concepts β€” types of ML (supervised, unsupervised, reinforcement learning).
  • Task Statement 1.3: Describe the ML development lifecycle β€” understanding when to apply unsupervised approaches.

References