Amazon Comprehend

Main Concept

Amazon Comprehend is a fully managed NLP (Natural Language Processing) service that uses ML to extract meaning and insights from text. You send text in, and Comprehend tells you what it means — no model training required. For domain-specific needs, Comprehend also supports custom models trained on your own data.

Context

Comprehend is the go-to AWS service when the problem involves understanding unstructured text at scale. Instead of reading thousands of customer reviews, support tickets, or social media posts manually, Comprehend processes them automatically and surfaces actionable insights.

Key Idea

  • Input → raw text (reviews, emails, documents, social media posts).

  • Output → structured insights about that text (sentiment, entities, language, key phrases, topics).

  • No ML expertise needed → fully managed, consumed via API.

Two Modes: Pre-built vs Custom

Key Idea

  • Pre-built capabilities → ready to use out of the box, no training data needed, covers general-purpose NLP tasks.

  • Custom capabilities → you provide labeled training data, Comprehend trains a model specific to your domain and categories.

When to use each mode

  • “Analyze customer reviews for positive/negative sentiment” → pre-built sentiment analysis.

  • “Classify support tickets into YOUR company’s specific categories (Billing, Technical, Returns, Shipping)” → Custom Classification.

  • “Detect standard entities like names and dates” → pre-built Entity Detection (NER).

  • “Detect YOUR company’s proprietary entities like internal product codes or contract clause types” → Custom Entity Recognition.

The Five Pre-built Capabilities

Sentiment Analysis

Determines whether the overall tone of a piece of text is positive, negative, neutral, or mixed.

Example

Input: “The product arrived quickly but the packaging was damaged.”
Output: Mixed sentiment (positive: shipping speed / negative: packaging)

Use case: automatically analyze thousands of customer reviews to measure overall satisfaction.

Entity Detection (Named Entity Recognition — NER)

Identifies and classifies named entities within text — people, places, organizations, dates, quantities, and more. The underlying NLP technique is called Named Entity Recognition (NER) — AWS surfaces this capability as “Entity Detection” in Amazon Comprehend.

Key Idea

  • NER → the academic/industry ML term for detecting named entities in text.

  • Entity Detection → what AWS calls NER inside Amazon Comprehend.

  • Both terms refer to the same capability — know both for the exam.

Example

Input: “John Smith placed an order from New York on January 5th for 250

Use case: automatically extract structured data from unstructured documents.

Language Detection

Identifies the language a piece of text is written in.

Example

Input: “Bonjour, je voudrais annuler ma commande.”
Output: French (fr)

Use case: automatically route support messages to the correct language team.

Key Phrase Extraction

Extracts the most important phrases and concepts from a body of text.

Example

Input: a 10-page legal document.
Output: “breach of contract”, “payment terms”, “liability clause”

Use case: quickly summarize the key topics in long documents without reading them fully.

Topic Modeling

Groups a large collection of documents by common themes — without you defining the themes in advance.

Example

Input: 10,000 customer support tickets.
Output: Topic 1 → billing issues / Topic 2 → delivery problems / Topic 3 → product defects

Use case: understand what customers are complaining about most without reading every ticket.

The Two Custom Capabilities

Custom Classification

Train a custom text classifier using your own labeled data so Comprehend learns YOUR specific categories — not just generic ones.

Key Idea

  • Pre-built topic modeling → Comprehend decides what the topics are.

  • Custom Classification → YOU define the categories, you provide labeled examples, Comprehend learns them.

  • Requires labeled training data → you provide examples, Comprehend trains the custom classifier.

Example

A legal firm wants to classify contracts into: NDA, Employment Agreement, Lease, Purchase Agreement, and Partnership Agreement. These are specific legal categories that pre-built Comprehend does not know. You provide 500 labeled examples per category → Comprehend trains a custom classifier → new contracts are automatically classified into your categories.

Custom Entity Recognition

Train a custom entity recognizer to detect domain-specific entities that the pre-built NER model does not know about. You provide labeled examples of your proprietary entities, and Comprehend learns to recognize them.

Key Idea

  • Pre-built NER / Entity Detection → detects standard entities (Person, Location, Organization, Date, Quantity).

  • Custom Entity Recognition → detects YOUR domain-specific entities that standard NER does not cover.

  • Requires labeled training data → you provide examples, Comprehend trains the custom recognizer.

Example 1

A pharmaceutical company wants to extract drug names, dosage amounts, and clinical trial IDs from research documents. These are not standard NER entities — they are proprietary to the domain.

Standard NER output: finds “Dr. Johnson” (Person) and “Boston” (Location).
Custom Entity Recognition output: also finds “Metformin 500mg” (Drug) and “CT-2024-0891” (Trial ID).

Example 2

A bank wants to detect proprietary financial instrument codes and internal account types from customer communications — terms that standard NER has never seen. Custom Entity Recognition learns them from 200 labeled examples provided by the bank’s team.

Common Exam Scenarios

Key Idea: When the answer is Amazon Comprehend

  • “Analyze customer reviews to understand satisfaction” → sentiment analysis.

  • “Extract names and dates from thousands of documents automatically” → entity detection / NER.

  • “Automatically detect what language incoming messages are written in” → language detection.

  • “Categorize support tickets by topic without predefined categories” → topic modeling.

  • “Classify documents into company-specific categories using labeled training data” → Custom Classification.

  • “Detect proprietary domain-specific entities not covered by standard NLP” → Custom Entity Recognition.

  • Any scenario involving understanding unstructured text at scale → Amazon Comprehend.

How It Differs from Similar Services

Amazon Comprehend → understands TEXT meaning  
(sentiment, entities, topics, language)

Amazon Textract → extracts TEXT from images/documents  
(OCR — reading text out of a PDF or photo)

Amazon Translate → converts text from one language to another  
(translation, not understanding)

Amazon Lex → builds conversational interfaces  
(chatbots — understands user intent in dialogue)

Analogy: A team of text analysts

Imagine you have 100,000 customer emails to analyze. Hiring humans to read and categorize them would take months and cost a fortune. Amazon Comprehend is like having an infinitely scalable team of text analysts that processes all 100,000 emails in seconds — extracting sentiment, identifying key topics, flagging mentions of specific entities — all via a single API call. And if your business has specific categories those analysts need to learn, Custom Classification trains them on your company’s unique taxonomy.

Exam Scope

You will not be asked how to configure or implement Comprehend. You need to:

  • Recognize what Comprehend does (NLP — text analysis).
  • Match business scenarios to Comprehend’s specific capabilities.
  • Distinguish between pre-built capabilities and custom capabilities.
  • Know that NER and Entity Detection refer to the same thing.
  • Distinguish Comprehend from similar services (Textract, Translate, Lex).

Exam Domain

  • Domain 1, Task Statement 1.2: “Explain the capabilities of AWS managed AI/ML services (for example, Amazon Comprehend).”
  • Domain 1, Task Statement 1.2: “Identify examples of real-world AI applications (for example, NLP).”