Amazon Transcribe

Main Concept

Amazon Transcribe is a fully managed speech-to-text service that automatically converts audio into written text. It uses a deep learning technique called ASR (Automatic Speech Recognition) to process speech quickly and accurately — no ML expertise required.

Key Idea

  • Input → audio (customer calls, meetings, videos, voice recordings).

  • Output → text transcription of that audio.

  • Underlying technology → ASR (Automatic Speech Recognition).

Core Features

PII Redaction

Automatically removes Personally Identifiable Information (PII) from transcriptions — names, ages, social security numbers, and other sensitive data are detected and redacted before the text is stored or processed.

Example

Customer call audio: “My name is John Smith and my SSN is 123-45-6789.”
Transcribe output with redaction: “My name is [PII] and my SSN is [PII].”

Use case: compliance-sensitive industries (healthcare, finance, legal) that cannot store customer PII in transcription logs.

Automatic Language Identification

Detects and transcribes multilingual audio automatically — if a speaker switches between French, English, and Spanish in the same recording, Transcribe identifies each language and transcribes accordingly.

Example

Use case: a global customer support center receiving calls in multiple languages — no manual routing or pre-selection of language needed.

Toxicity Detection

Analyzes audio for toxic content using a combination of two signal types:

Key Idea

  • Speech cues → tone and pitch of the audio (angry voice, aggressive delivery).

  • Text cues → the actual words spoken (profanity, hate speech, threats).

  • The combination of both makes toxicity detection more accurate than text-only analysis.

Toxicity categories detected: sexual harassment, hate speech, threats, abuse, profanity, insult, and graphic content.

Example

Use case: a contact center wants to automatically flag abusive customer calls for supervisor review — without a human having to listen to every recording.

Improving Transcription Accuracy

Custom Vocabularies

Teaches Transcribe new WORDS — specific terms, brand names, acronyms, and jargon it has never encountered before. You can also provide pronunciation hints.

Example

Without custom vocabulary: Transcribe hears “AWS Microservices” and outputs “USA my crow services.”
With custom vocabulary: Transcribe learns the word “microservices” and outputs “AWS Microservices” correctly.

Use case: technical companies, medical practices, legal firms — any domain with specialized terminology.

Custom Language Models

Teaches Transcribe the CONTEXT of your domain — not new words, but how words relate to each other in your specific field.

Example

The word “crow” could mean a bird or a shortening of “microservice” depending on context. Without a custom language model, Transcribe guesses based on general language patterns. With a custom language model trained on your IT documentation, Transcribe understands that in your context “crow” likely means “microservice.”

Use case: when your domain uses common words in uncommon ways.

Key Idea: Custom Vocabularies vs Custom Language Models

  • Custom Vocabularies → teach new WORDS (brand names, acronyms, jargon).

  • Custom Language Models → teach CONTEXT (how words relate in your domain).

  • Use both together → for highest transcription accuracy.

Use Cases

Transcribe customer service calls → quality assurance, compliance  
Automated closed captioning → video accessibility  
Subtitle generation → media and entertainment  
Searchable media archive → generate metadata from audio/video  
Toxicity detection in calls → contact center safety  
Medical dictation → clinical documentation

Minuta 2026-06-19 Amazon Polly

Critical Distinction: Transcribe vs Polly

Amazon Transcribe → Speech TO Text (audio in, text out)  
Amazon Polly → Text TO Speech (text in, audio out)

Analogy: A court stenographer vs a narrator

Amazon Transcribe is the court stenographer — listens to everything said and writes it down. Amazon Polly is the narrator — reads written text aloud in a natural voice. Opposite directions, complementary services.

Exam Scope

You will not be asked how to configure Transcribe. You need to:

  • Know what Transcribe does (speech-to-text, ASR).
  • Recognize the four key features: PII redaction, automatic language identification, toxicity detection, accuracy improvement.
  • Distinguish Custom Vocabularies (new words) from Custom Language Models (context).
  • Distinguish Transcribe (speech-to-text) from Polly (text-to-speech).
  • Toxicity detection is explicitly flagged by Maarek as exam-relevant — know the two signal types (speech cues + text cues) and the categories.

Exam Domain

  • Domain 1, Task Statement 1.2: “Explain the capabilities of AWS managed AI/ML services (for example, Amazon Transcribe).”
  • Domain 1, Task Statement 1.2: “Identify examples of real-world AI applications (for example, speech recognition).”