Amazon Bedrock - Cost & Tradeoffs Cheatsheet

Pricing Models

ModelCostBest ForRisk
On-Demand💰💰💰Low/unpredictable volumeNone
Provisioned Throughput💰💰 per tokenHigh/predictable volumePay even if unused
Batch Inference💰 (50% cheaper)Non-urgent large volumesNot real-time

Customization Approaches

ApproachCostModifies Model?Best For
Prompt Engineering💰NoSimple behavior changes
In-context Learning💰NoFew examples in prompt
RAG💰💰NoCustom knowledge, privacy
Fine-tuning💰💰💰YesSpecific behavior/domain
Continued Pre-training💰💰💰💰YesDeep domain adaptation

Customization vs Pricing Model

ApproachOn-DemandProvisioned Throughput
Prompt Engineering
RAG
Fine-tuned model✅ testing only✅ production
Continued pre-training✅ testing only✅ production

Key Tradeoffs

FactorTradeoff
More customizationHigher cost, better task-specific performance
Larger modelHigher cost, better quality
Smaller modelLower cost, sufficient for simple tasks
RAG vs Fine-tuningRAG = cheaper + updatable / Fine-tuning = better performance + static
Provisioned vs On-DemandProvisioned = cheaper at scale / On-Demand = safer at low volume
Batch vs Real-timeBatch = cheaper / Real-time = faster

Cost Optimization Rules

ScenarioRecommendation
Need custom knowledgeRAG first (cheapest, no training)
Need specific behaviorFine-tuning (if RAG is not enough)
Simple tasksSmaller/cheaper model (Haiku vs Sonnet)
High predictable volumeProvisioned Throughput
Non-urgent processingBatch Inference
Long-term log storageS3 over CloudWatch

Exam Keywords to Recognize

  • Token-based pricing → On-Demand
  • Provisioned Throughput → reserved capacity, custom models at scale
  • Batch inference → non-real-time, cheapest
  • RAG → sweet spot, no training cost
  • Fine-tuning → training cost + Provisioned Throughput for production
  • In-context learning → no training, prompt-based, On-Demand

Exam Domain

  • Domain 2, Task 2.3: cost tradeoffs of AWS generative AI services
  • Domain 3, Task 3.1: cost tradeoffs of foundation model customization