LLM Fundamentals
What large language models are, how they work, and the vocabulary you need before the rest of this tutorial makes sense.
What Is an LLM
A large language model is an AI system trained on huge amounts of text. It can:
- Read and understand human language
- Generate human-like text
- Follow instructions
- Answer questions and hold conversations
A useful mental model: autocomplete on steroids. The model predicts the most likely next token (a word or fragment) based on patterns learned from billions of examples. Stack enough of those predictions together and you get full responses.
How LLMs Work
LLMs go through two phases.
Training Phase
The model is fed billions of text examples (books, websites, code, transcripts) and learns statistical patterns: grammar, facts, reasoning shapes, style. This takes months on huge clusters of GPUs and costs tens to hundreds of millions of dollars per frontier model.
Usage Phase (What You Do)
You provide a prompt. The model runs your tokens through its layers and predicts the most likely next tokens, one after another, until it stops. This happens in seconds.
The thing to internalise: LLMs do not "know" facts the way you do. They predict statistically likely text given their training. That is why the same model can answer a hard question correctly and then state, with the same confidence, something completely false.
Major Providers (2026)
| Provider | Model | Strengths | Best For |
|---|---|---|---|
| OpenAI | GPT-5, GPT-5 mini | Most versatile, biggest ecosystem | General use, coding, integrations |
| Anthropic | Claude Opus 4.7, Sonnet 4.6, Haiku 4.5 | Long context, careful reasoning | Research, analysis, long documents, agents |
| Gemini 2.5 Pro | Multimodal, fast, integrated with Google | Research, multimedia, Google users | |
| Meta | Llama 3 / 4 | Open weights, customizable | Self-hosting, privacy, customization |
| Mistral | Mistral Large | European, fast, efficient | Europe-based, privacy-conscious |
What LLMs Can Do
Text Generation
- Writing emails, articles, code, stories
- Summarizing long documents
- Translating languages
- Rewriting content in different styles
Analysis
- Extracting key points from text
- Answering questions about documents
- Comparing and contrasting ideas
- Finding patterns in unstructured data
Problem Solving
- Breaking down complex problems
- Creating step-by-step plans
- Debugging code
- Suggesting solutions with trade-offs
Structured Output
- Generating JSON, CSV, markdown
- Creating tables and lists
- Following templates precisely
Important Terminology
Context Window
The amount of text an LLM can "see" at once, measured in tokens (roughly 0.75 English words per token).
- GPT-5: up to 400K tokens (~300K words)
- Claude Opus 4.7: 1M tokens. Sonnet 4.6 and Haiku 4.5: 200K tokens
- Gemini 2.5: 1-2M tokens (~750K-1.5M words)
Larger windows let you fit longer documents and more conversation history.
Token
The basic unit LLMs work with. Roughly:
- 1 token is about 0.75 English words
- "Hello world" is 2 tokens
- "Artificial Intelligence" is 3 tokens
API pricing is per token. Context limits are in tokens. You will hear the word a lot.
Temperature
Controls randomness of output, usually 0.0 to 2.0:
- 0.0-0.3: Focused, deterministic, consistent (facts, code, analysis)
- 0.4-0.7: Balanced (default for most uses)
- 0.8-2.0: Creative, varied, less predictable (brainstorming, fiction)
Top-p (Nucleus Sampling)
An alternative to temperature. The model only considers the most likely tokens until their combined probability adds up to p.
- 0.1: Very focused, conservative
- 0.9: Balanced (the usual default)
- 0.95: Allows more variety
You can usually ignore this and tune temperature instead.
Fine-tuning
Training an LLM further on specific data to specialize it for a task or domain.
Examples:
- Fine-tuning on medical literature for medical Q&A
- Training on company docs for customer service
- Specializing in legal document analysis
For most users, good prompting gets you 90% of the way there. Fine-tuning is a real engineering project, not a quick win.
RAG (Retrieval-Augmented Generation)
Giving LLMs access to external knowledge by:
- Searching a database for relevant info
- Including that info in the prompt
- Letting the LLM generate a response using that context
RAG is the standard fix for knowledge cutoffs and hallucinations on private data.
Embeddings
Converting text into numbers (vectors) that capture meaning. Similar texts have similar vectors.
Used for search, recommendations, clustering, and powering RAG systems.
How LLMs Are Trained
Pre-training
Train on massive general datasets (internet, books, code). The model learns language patterns, facts, and reasoning shapes. This is months of compute on supercomputers and produces a "base model".
Instruction Tuning
Fine-tune on examples of following instructions. This teaches the base model to be helpful, answer questions, and behave more like an assistant.
RLHF (Reinforcement Learning from Human Feedback)
Humans rate model outputs and the model learns what "good" looks like. This is where most of the safety, helpfulness, and tone improvements happen.
The LLM you talk to has been through all three phases.
What LLMs Are Good At
Excellent
- Text generation and rewriting
- Summarization
- Translation
- Code generation and debugging
- Brainstorming and ideation
- Explaining concepts
- Following complex instructions
- Pattern matching in text
Good
- Math, when paired with code execution
- Research and synthesis
- Structured data extraction
- Template filling
- Style mimicry
Limited
- Real-time information (knowledge cutoff)
- Math without code execution
- Counting and precise operations
- Long chains of strict logical reasoning
- Image understanding for text-only models
Poor
- True understanding versus pattern matching
- Knowing what they do not know
- Consistent logic across very long chains
- Physical-world reasoning
Knowledge Cutoff
LLMs are trained on data up to a specific date, called their "knowledge cutoff":
- GPT-5: late 2024
- Claude Opus 4.7 / Sonnet 4.6: January 2026
- Gemini 2.5: mid 2024 (updated periodically)
After their cutoff, the model knows nothing unless you tell it.
Workarounds:
- Use models with web search (ChatGPT Plus, Perplexity)
- Provide recent information in your prompt
- Use RAG systems for current data
Multimodal Capabilities
Modern LLMs handle more than text.
Vision
- Upload images and ask questions
- Analyze charts, diagrams, screenshots
- Extract text from images (OCR)
- Describe visual content
Code Execution
- Run Python internally
- Perform calculations accurately
- Analyze data files
- Generate plots and visualizations
Audio
- Voice conversations
- Transcription
- Audio analysis
Video
Still early as of 2026. Frame-by-frame analysis works; full video reasoning is improving fast.
Model Sizes
LLMs come in different sizes, measured in parameters (the trainable weights inside the network).
Small (1-10B parameters)
- Fast, cheap, can run on a laptop
- Good for simple tasks
- Limited reasoning
Medium (10-70B parameters)
- Balanced performance and cost
- Most common API size
- Good for most use cases
Large (70B+ parameters)
- Best performance
- Expensive, slower
- Better at complex reasoning
More parameters generally means more capable, but not always. Architecture, training data, and post-training all matter.
Privacy and Security
What Providers Do With Your Data
OpenAI (ChatGPT)
- Free tier: Data may be used for training (you can opt out)
- Plus, Pro, Enterprise: Not used for training by default
- API: Not used for training
Anthropic (Claude)
- Never uses conversations for training
- Enterprise options for additional security
- API: Not used for training
Google (Gemini)
- May use data to improve services (you can opt out)
- Workspace accounts have different policies
Practical Rules
- Do not paste passwords, API keys, PII, or confidential business data into a chat
- Use API or enterprise plans for work
- Read the privacy policy for your specific use case
- Consider self-hosted models for genuinely sensitive work
Costs
Consumer Plans
- Free: Limited access, often older models
- $20-30/month: Full access to current models, higher rate limits
- $200+/month: Pro tiers with even more capacity
API Pricing (Pay Per Use)
- Input tokens: $0.15 to $15 per 1M tokens
- Output tokens: $0.60 to $75 per 1M tokens
- Varies by model size and speed
A worked example. Analyzing a 10-page document (about 5K tokens) with GPT-5:
- Input: ~$0.08
- Output (2K tokens): ~$0.12
- Total: ~$0.20 per analysis
What Has Changed Recently (2024-2026)
- Longer context windows: from 8K to 2M+ tokens. Whole books and codebases now fit
- Multimodal by default: vision, audio, and code execution come standard on frontier models
- Faster and cheaper: same quality as 2023, roughly 10x faster and 10x cheaper
- Specialized models: code-focused tools (Copilot, Cursor), math, medical, legal
- Agents: LLMs that use tools and run multi-step tasks. Useful but still rough at the edges
- Open source caught up: Llama and Mistral are within striking distance of commercial models for many tasks
Common Misconceptions
"LLMs understand like humans." They pattern-match very well. They do not understand in the way you do.
"LLMs are always right." They confidently produce false answers. Verify anything that matters.
"You need to be technical to use them." You do not. Basic prompting skill is enough.
"LLMs will replace programmers, writers, and so on." They augment people. The good ones get a lot more done.
"Bigger models are always better." Bigger models are better at hard reasoning. Smaller models are faster and cheaper for easy tasks.
"LLMs have hidden agendas." They predict text. There is no consciousness, goal, or desire underneath.
Next Steps
Continue to 02-prompting-basics.md to learn how to write prompts that actually work. Before you do, sign up for at least one LLM service (ChatGPT, Claude, or Gemini) and try a simple prompt. Reading about prompting without doing it is like reading about cycling.
Further Reading
- OpenAI's GPT-4 Technical Report
- Anthropic's Model Card for Claude
- Google's Gemini Documentation
- The Illustrated Transformer for how transformers work under the hood
- State of AI Report for an annual snapshot of the field