LLM Fundamentals
Understanding what LLMs are, how they work, and the key concepts you need to know.
What is an LLM?
A Large Language Model is an AI system trained on massive amounts of text data to:
- Understand human language
- Generate human-like text
- Perform tasks based on instructions
- Answer questions and hold conversations
Think of it as autocomplete on steroids. It predicts the most likely next word, sentence, or paragraph based on patterns learned from billions of examples.
How LLMs Work (Simplified)
1. Training Phase
- Fed billions of text examples from books, websites, code, etc.
- Learns patterns: grammar, facts, reasoning, style
- Develops statistical understanding of language
- Takes months and costs millions of dollars
2. Usage Phase (What You Do)
- You provide a prompt (input)
- LLM processes it through neural network layers
- Predicts and generates the most likely response
- Happens in seconds
Key Insight: LLMs don't "know" things like humans do. They predict statistically likely responses based on training data. This is why they can seem brilliant but also make confident mistakes.
Major LLM Providers (2025-2026)
| Provider | Model | Strengths | Best For |
|---|---|---|---|
| OpenAI | GPT-5, GPT-5 mini | Most versatile, huge ecosystem | General use, coding, integrations |
| Anthropic | Claude Opus 4.7, Sonnet 4.6, Haiku 4.5 | Long context, careful reasoning | Research, analysis, long documents, agents |
| Gemini 2.5 Pro | Multimodal, fast, integrated with Google | Research, multimedia, Google users | |
| Meta | Llama 3 / 4 | Open weights, customizable | Self-hosting, privacy, customization |
| Mistral | Mistral Large | European, fast, efficient | Europe-based, privacy-conscious |
Key Capabilities
Text Generation
- Writing emails, articles, code, stories
- Summarizing long documents
- Translating languages
- Rewriting content in different styles
Analysis & Understanding
- Extracting key points from text
- Answering questions about documents
- Comparing and contrasting ideas
- Finding patterns and insights
Problem Solving
- Breaking down complex problems
- Creating step-by-step plans
- Debugging code
- Suggesting solutions with trade-offs
Structured Output
- Generating JSON, CSV, markdown
- Creating tables and lists
- Formatting data consistently
- Following templates precisely
Important Terminology
Context Window
The amount of text an LLM can "see" at once. Measured in tokens (roughly 0.75 words).
- GPT-5: up to 400K tokens (300K words)
- Claude Opus 4.7: 1M tokens; Sonnet 4.6 / Haiku 4.5: 200K tokens
- Gemini 2.5: 1-2M tokens (750K-1.5M words)
Why it matters: Larger windows = can process longer documents, more conversation history, more context.
Token
The basic unit LLMs work with. A token is roughly:
- 1 token ≈ 0.75 English words
- "Hello world" = 2 tokens
- "Artificial Intelligence" = 3 tokens
Why it matters: API pricing is per token. Context limits are in tokens.
Temperature
Controls randomness of output (0.0 to 2.0):
- 0.0-0.3: Focused, deterministic, consistent (facts, code, analysis)
- 0.4-0.7: Balanced (default for most uses)
- 0.8-2.0: Creative, varied, unpredictable (brainstorming, creative writing)
Top-p (Nucleus Sampling)
Alternative to temperature. Considers only the most likely tokens until probability adds up to p.
- 0.1: Very focused, conservative
- 0.9: Balanced
- 0.95: Allows more variety
Most users can ignore this and just use temperature.
Fine-tuning
Training an LLM further on specific data to specialize it for a task or domain.
Examples:
- Fine-tuning on medical literature for medical Q&A
- Training on company docs for customer service
- Specializing in legal document analysis
Reality: Most users don't need this. Good prompting is 90% as effective.
RAG (Retrieval-Augmented Generation)
Giving LLMs access to external knowledge by:
- Searching a database for relevant info
- Including that info in the prompt
- LLM generates response using that context
Why it matters: Overcomes knowledge cutoff dates and hallucinations.
Embeddings
Converting text into numbers (vectors) that capture meaning. Similar texts have similar vectors.
Use case: Search, recommendations, clustering, finding related content.
How LLMs Learn (Training Process)
Pre-training
- Train on massive general datasets (internet, books, code)
- Learns language patterns, facts, reasoning
- Takes months on supercomputers
- Base model emerges
Instruction Tuning
- Fine-tune on examples of following instructions
- Teaches model to be helpful, answer questions
- Makes it useful for end users
RLHF (Reinforcement Learning from Human Feedback)
- Humans rate model outputs
- Model learns what "good" looks like
- Improves safety, helpfulness, accuracy
Result: The LLM you use has gone through all three phases.
What LLMs Are Good At
✅ Excellent
- Text generation and rewriting
- Summarization
- Translation
- Code generation and debugging
- Brainstorming and ideation
- Explaining concepts
- Following complex instructions
- Pattern matching in text
✅ Good
- Math (with code execution)
- Research and synthesis
- Structured data extraction
- Template filling
- Style mimicry
⚠️ Limited
- Real-time information (knowledge cutoff)
- Math without code execution
- Counting and precise operations
- Reasoning with many steps
- Understanding images (text-only models)
❌ Poor
- True understanding vs. pattern matching
- Knowing what they don't know
- Consistent logic in long chains
- Physical world reasoning
Knowledge Cutoff
LLMs are trained on data up to a specific date (their "knowledge cutoff"):
- GPT-5: late 2024
- Claude Opus 4.7 / Sonnet 4.6: January 2026
- Gemini 2.5: mid 2024 (updated periodically)
What this means: They don't know about events after their cutoff unless you tell them.
Solutions:
- Use models with web search (ChatGPT Plus, Perplexity)
- Provide recent info in your prompts
- Use RAG systems for current data
Multimodal Capabilities
Modern LLMs can handle more than text:
Vision
- Upload images and ask questions
- Analyze charts, diagrams, screenshots
- Extract text from images (OCR)
- Describe visual content
Code Execution
- Run Python code internally
- Perform calculations accurately
- Analyze data files
- Generate plots and visualizations
Audio (Some Models)
- Voice conversations
- Transcription
- Audio analysis
Video (Coming Soon)
- Video understanding and analysis
- Frame-by-frame processing
Model Sizes and Versions
LLMs come in different sizes:
Small (1-10B parameters)
- Fast, cheap, runs locally
- Good for simple tasks
- Limited reasoning
Medium (10-70B parameters)
- Balanced performance and cost
- Most common size for APIs
- Good for most use cases
Large (70B+ parameters)
- Best performance
- Expensive, slower
- Complex reasoning
Parameters = The weights in the neural network. More ≈ more capable (but not always).
Privacy and Security
What Providers Do With Your Data
OpenAI (ChatGPT):
- Free tier: Data may be used for training (can opt out)
- Plus/Pro/Enterprise: Not used for training by default
- API: Not used for training
Anthropic (Claude):
- Never uses conversations for training
- Enterprise options for additional security
- API: Not used for training
Google (Gemini):
- May use data to improve services (can opt out)
- Workspace accounts have different policies
Best Practices:
- Don't share sensitive/confidential data
- Use API or enterprise plans for business
- Read privacy policies for your use case
- Consider self-hosted models for sensitive work
Costs
Consumer Plans
- Free: Limited access to older models
- $20-30/month: Full access to latest models, higher rate limits
- $200+/month: Professional/Pro plans with even more capacity
API Pricing (Pay Per Use)
- Input tokens: $0.15-$15 per 1M tokens
- Output tokens: $0.60-$75 per 1M tokens
- Varies by model size and speed
Example: Analyzing a 10-page document (5K tokens) with GPT-5:
- Input: ~$0.08
- Output (2K tokens): ~$0.12
- Total: ~$0.20 per analysis
Latest Developments (2024-2025)
Longer Context Windows
- From 8K to 2M+ tokens
- Can process entire books, codebases
- Better conversation memory
Multimodal by Default
- Vision, audio, code execution standard
- Smoother integration across modalities
Faster and Cheaper
- Same quality, 10x faster, 10x cheaper than 2023
- Makes more use cases economically viable
Specialized Models
- Code-specific (GitHub Copilot, Cursor)
- Math-specific (Minerva)
- Domain-specific (medical, legal)
AI Agents
- LLMs that can use tools
- Multi-step autonomous tasks
- Still early but rapidly improving
Open Source Progress
- Meta's Llama 3, Mistral catching up to commercial models
- Can run locally on good hardware
- Privacy and customization benefits
Common Misconceptions
❌ "LLMs understand like humans"
- ✅ They pattern match incredibly well, but don't have true understanding
❌ "LLMs are always right"
- ✅ They confidently hallucinate. Always verify important information
❌ "You need to be technical to use them"
- ✅ Anyone can use them effectively with basic prompting skills
❌ "LLMs will replace programmers/writers/etc"
- ✅ They augment professionals, making them more productive
❌ "Bigger models are always better"
- ✅ Bigger models are better at complex reasoning, but smaller models can be faster and cheaper for simple tasks
❌ "LLMs have secret agendas"
- ✅ They predict text patterns. No consciousness, goals, or desires
Summary
Key Takeaways:
- LLMs predict text based on patterns in training data
- They're extremely capable but have clear limitations
- Major providers: OpenAI (GPT), Anthropic (Claude), Google (Gemini)
- Know the key terms: context window, tokens, temperature, hallucination
- Modern LLMs are multimodal: text, images, code
- Privacy matters: understand how providers handle your data
- The field moves fast, so what's true today may change in months
Next Steps:
- Move to Chapter 02 to learn effective prompting
- Sign up for at least one LLM service
- Experiment with simple prompts
- Read the provider's documentation
Further Reading
- OpenAI's GPT-4 Technical Report
- Anthropic's Model Card for Claude
- Google's Gemini Documentation
- The Illustrated Transformer - How transformers work
- State of AI Report - Annual comprehensive overview