Main Concept
GPT stands for Generative Pre-trained Transformer. It’s a type of language model that generates human-like text or code based on input prompts. GPT models are trained on massive amounts of text data and use the Transformer architecture.
Key Points
- Generative — produces new text, not just classifies or analyzes
- Pre-trained — trained on a massive dataset first, then can be fine-tuned for specific tasks
- Transformer — uses the attention mechanism architecture
Common Examples
- GPT-3, GPT-4 (OpenAI)
- Claude (Anthropic) — also Transformer-based
- Amazon Nova models (AWS)
Use Cases
- Text generation and completion
- Code generation and explanation
- Translation and summarization
- Question answering
AIF-C01 Context
You should recognize GPT as a Transformer-based LLM. The exam won’t ask you to build one, but will ask you to identify when to use it versus other model types (e.g., diffusion for images, RNN for time-series).