Amazon Bedrock - Cost & Tradeoffs Cheatsheet

Pricing Models

Model	Cost	Best For	Risk
On-Demand	💰💰💰	Low/unpredictable volume	None
Provisioned Throughput	💰💰 per token	High/predictable volume	Pay even if unused
Batch Inference	💰 (50% cheaper)	Non-urgent large volumes	Not real-time

Customization Approaches

Approach	Cost	Modifies Model?	Best For
Prompt Engineering	💰	No	Simple behavior changes
In-context Learning	💰	No	Few examples in prompt
RAG	💰💰	No	Custom knowledge, privacy
Fine-tuning	💰💰💰	Yes	Specific behavior/domain
Continued Pre-training	💰💰💰💰	Yes	Deep domain adaptation

Customization vs Pricing Model

Approach	On-Demand	Provisioned Throughput
Prompt Engineering	✅	✅
RAG	✅	✅
Fine-tuned model	✅ testing only	✅ production
Continued pre-training	✅ testing only	✅ production

Key Tradeoffs

Factor	Tradeoff
More customization	Higher cost, better task-specific performance
Larger model	Higher cost, better quality
Smaller model	Lower cost, sufficient for simple tasks
RAG vs Fine-tuning	RAG = cheaper + updatable / Fine-tuning = better performance + static
Provisioned vs On-Demand	Provisioned = cheaper at scale / On-Demand = safer at low volume
Batch vs Real-time	Batch = cheaper / Real-time = faster

Cost Optimization Rules

Scenario	Recommendation
Need custom knowledge	RAG first (cheapest, no training)
Need specific behavior	Fine-tuning (if RAG is not enough)
Simple tasks	Smaller/cheaper model (Haiku vs Sonnet)
High predictable volume	Provisioned Throughput
Non-urgent processing	Batch Inference
Long-term log storage	S3 over CloudWatch

Exam Keywords to Recognize

Token-based pricing → On-Demand
Provisioned Throughput → reserved capacity, custom models at scale
Batch inference → non-real-time, cheapest
RAG → sweet spot, no training cost
Fine-tuning → training cost + Provisioned Throughput for production
In-context learning → no training, prompt-based, On-Demand

Exam Domain

Domain 2, Task 2.3: cost tradeoffs of AWS generative AI services
Domain 3, Task 3.1: cost tradeoffs of foundation model customization