Main Concept

A Small Language Model (SLM) is a language model designed to be compact enough to run on devices with limited computing resources, while still being capable of handling a defined set of tasks effectively.

Context

LLMs like Claude or GPT require significant compute resources and run on powerful remote servers. SLMs trade raw capability for efficiency β€” they are smaller, faster, and can run locally on constrained hardware without an internet connection.

Key Idea LLM β†’ powerful, large, remote server, high latency, needs connectivity SLM β†’ compact, limited, local device, low latency, works offline

Key Characteristics

  • Low compute footprint β€” runs on edge devices (Raspberry Pi, smartphones, microcontrollers, embedded systems).
  • Very low latency β€” no network round-trip required.
  • Offline capability β€” does not require internet connectivity.
  • Narrower capability β€” optimized for specific tasks rather than general-purpose reasoning. - Lower cost per inference β€” no cloud API call needed.

Tradeoffs vs LLMs

SLMLLM
LocationEdge device (local)Remote server (cloud)
LatencyVery lowHigher
ConnectivityNot requiredRequired
CapabilityLimited, task-specificBroad, general-purpose
Cost per inferenceLowHigher
HardwareConstrainedPowerful

Relationship to Edge Computing

SLMs are the enabling technology for inferencing at the edge. Without compact models that fit on constrained hardware, edge inferencing would not be practical. See Inferencing at the Edge.

Hybrid Pattern

SLMs and LLMs are not mutually exclusive. A common architecture uses both:

  • An SLM runs locally on the edge device for fast, low-complexity decisions.
  • When the query exceeds the SLM’s capability or requires deeper reasoning, the device makes an API call to a remote LLM in the cloud.

This hybrid pattern achieves both low latency for routine decisions and high accuracy for complex ones β€” at the cost of added architectural complexity and occasional network dependency.

Examples: voice assistants (wake word local, complex queries cloud), autonomous vehicles (driving decisions local, fleet analytics cloud).

AIF-C01 Exam Domains

SLM as a specific term does not appear explicitly in the AIF-C01 exam guide. However the underlying concepts connect to:

  • Latency as a model selection criterion β€” Domain 3, Task Statement 3.1
  • Model size and complexity as selection criteria β€” Domain 3, Task Statement 3.1
  • Cost tradeoffs of generative AI services β€” Domain 2, Task Statement 2.3