LLM Fundamentals

Understanding what LLMs are, how they work, and the key concepts you need to know.

What is an LLM?

A Large Language Model is an AI system trained on massive amounts of text data to:

  • Understand human language
  • Generate human-like text
  • Perform tasks based on instructions
  • Answer questions and hold conversations

Think of it as autocomplete on steroids. It predicts the most likely next word, sentence, or paragraph based on patterns learned from billions of examples.

How LLMs Work (Simplified)

1. Training Phase

  • Fed billions of text examples from books, websites, code, etc.
  • Learns patterns: grammar, facts, reasoning, style
  • Develops statistical understanding of language
  • Takes months and costs millions of dollars

2. Usage Phase (What You Do)

  • You provide a prompt (input)
  • LLM processes it through neural network layers
  • Predicts and generates the most likely response
  • Happens in seconds

Key Insight: LLMs don't "know" things like humans do. They predict statistically likely responses based on training data. This is why they can seem brilliant but also make confident mistakes.

Major LLM Providers (2025-2026)

ProviderModelStrengthsBest For
OpenAIGPT-5, GPT-5 miniMost versatile, huge ecosystemGeneral use, coding, integrations
AnthropicClaude Opus 4.7, Sonnet 4.6, Haiku 4.5Long context, careful reasoningResearch, analysis, long documents, agents
GoogleGemini 2.5 ProMultimodal, fast, integrated with GoogleResearch, multimedia, Google users
MetaLlama 3 / 4Open weights, customizableSelf-hosting, privacy, customization
MistralMistral LargeEuropean, fast, efficientEurope-based, privacy-conscious

Key Capabilities

Text Generation

  • Writing emails, articles, code, stories
  • Summarizing long documents
  • Translating languages
  • Rewriting content in different styles

Analysis & Understanding

  • Extracting key points from text
  • Answering questions about documents
  • Comparing and contrasting ideas
  • Finding patterns and insights

Problem Solving

  • Breaking down complex problems
  • Creating step-by-step plans
  • Debugging code
  • Suggesting solutions with trade-offs

Structured Output

  • Generating JSON, CSV, markdown
  • Creating tables and lists
  • Formatting data consistently
  • Following templates precisely

Important Terminology

Context Window

The amount of text an LLM can "see" at once. Measured in tokens (roughly 0.75 words).

  • GPT-5: up to 400K tokens (300K words)
  • Claude Opus 4.7: 1M tokens; Sonnet 4.6 / Haiku 4.5: 200K tokens
  • Gemini 2.5: 1-2M tokens (750K-1.5M words)

Why it matters: Larger windows = can process longer documents, more conversation history, more context.

Token

The basic unit LLMs work with. A token is roughly:

  • 1 token ≈ 0.75 English words
  • "Hello world" = 2 tokens
  • "Artificial Intelligence" = 3 tokens

Why it matters: API pricing is per token. Context limits are in tokens.

Temperature

Controls randomness of output (0.0 to 2.0):

  • 0.0-0.3: Focused, deterministic, consistent (facts, code, analysis)
  • 0.4-0.7: Balanced (default for most uses)
  • 0.8-2.0: Creative, varied, unpredictable (brainstorming, creative writing)

Top-p (Nucleus Sampling)

Alternative to temperature. Considers only the most likely tokens until probability adds up to p.

  • 0.1: Very focused, conservative
  • 0.9: Balanced
  • 0.95: Allows more variety

Most users can ignore this and just use temperature.

Fine-tuning

Training an LLM further on specific data to specialize it for a task or domain.

Examples:

  • Fine-tuning on medical literature for medical Q&A
  • Training on company docs for customer service
  • Specializing in legal document analysis

Reality: Most users don't need this. Good prompting is 90% as effective.

RAG (Retrieval-Augmented Generation)

Giving LLMs access to external knowledge by:

  1. Searching a database for relevant info
  2. Including that info in the prompt
  3. LLM generates response using that context

Why it matters: Overcomes knowledge cutoff dates and hallucinations.

Embeddings

Converting text into numbers (vectors) that capture meaning. Similar texts have similar vectors.

Use case: Search, recommendations, clustering, finding related content.

How LLMs Learn (Training Process)

Pre-training

  • Train on massive general datasets (internet, books, code)
  • Learns language patterns, facts, reasoning
  • Takes months on supercomputers
  • Base model emerges

Instruction Tuning

  • Fine-tune on examples of following instructions
  • Teaches model to be helpful, answer questions
  • Makes it useful for end users

RLHF (Reinforcement Learning from Human Feedback)

  • Humans rate model outputs
  • Model learns what "good" looks like
  • Improves safety, helpfulness, accuracy

Result: The LLM you use has gone through all three phases.

What LLMs Are Good At

Excellent

  • Text generation and rewriting
  • Summarization
  • Translation
  • Code generation and debugging
  • Brainstorming and ideation
  • Explaining concepts
  • Following complex instructions
  • Pattern matching in text

Good

  • Math (with code execution)
  • Research and synthesis
  • Structured data extraction
  • Template filling
  • Style mimicry

⚠️ Limited

  • Real-time information (knowledge cutoff)
  • Math without code execution
  • Counting and precise operations
  • Reasoning with many steps
  • Understanding images (text-only models)

Poor

  • True understanding vs. pattern matching
  • Knowing what they don't know
  • Consistent logic in long chains
  • Physical world reasoning

Knowledge Cutoff

LLMs are trained on data up to a specific date (their "knowledge cutoff"):

  • GPT-5: late 2024
  • Claude Opus 4.7 / Sonnet 4.6: January 2026
  • Gemini 2.5: mid 2024 (updated periodically)

What this means: They don't know about events after their cutoff unless you tell them.

Solutions:

  • Use models with web search (ChatGPT Plus, Perplexity)
  • Provide recent info in your prompts
  • Use RAG systems for current data

Multimodal Capabilities

Modern LLMs can handle more than text:

Vision

  • Upload images and ask questions
  • Analyze charts, diagrams, screenshots
  • Extract text from images (OCR)
  • Describe visual content

Code Execution

  • Run Python code internally
  • Perform calculations accurately
  • Analyze data files
  • Generate plots and visualizations

Audio (Some Models)

  • Voice conversations
  • Transcription
  • Audio analysis

Video (Coming Soon)

  • Video understanding and analysis
  • Frame-by-frame processing

Model Sizes and Versions

LLMs come in different sizes:

Small (1-10B parameters)

  • Fast, cheap, runs locally
  • Good for simple tasks
  • Limited reasoning

Medium (10-70B parameters)

  • Balanced performance and cost
  • Most common size for APIs
  • Good for most use cases

Large (70B+ parameters)

  • Best performance
  • Expensive, slower
  • Complex reasoning

Parameters = The weights in the neural network. More ≈ more capable (but not always).

Privacy and Security

What Providers Do With Your Data

OpenAI (ChatGPT):

  • Free tier: Data may be used for training (can opt out)
  • Plus/Pro/Enterprise: Not used for training by default
  • API: Not used for training

Anthropic (Claude):

  • Never uses conversations for training
  • Enterprise options for additional security
  • API: Not used for training

Google (Gemini):

  • May use data to improve services (can opt out)
  • Workspace accounts have different policies

Best Practices:

  • Don't share sensitive/confidential data
  • Use API or enterprise plans for business
  • Read privacy policies for your use case
  • Consider self-hosted models for sensitive work

Costs

Consumer Plans

  • Free: Limited access to older models
  • $20-30/month: Full access to latest models, higher rate limits
  • $200+/month: Professional/Pro plans with even more capacity

API Pricing (Pay Per Use)

  • Input tokens: $0.15-$15 per 1M tokens
  • Output tokens: $0.60-$75 per 1M tokens
  • Varies by model size and speed

Example: Analyzing a 10-page document (5K tokens) with GPT-5:

  • Input: ~$0.08
  • Output (2K tokens): ~$0.12
  • Total: ~$0.20 per analysis

Latest Developments (2024-2025)

Longer Context Windows

  • From 8K to 2M+ tokens
  • Can process entire books, codebases
  • Better conversation memory

Multimodal by Default

  • Vision, audio, code execution standard
  • Smoother integration across modalities

Faster and Cheaper

  • Same quality, 10x faster, 10x cheaper than 2023
  • Makes more use cases economically viable

Specialized Models

  • Code-specific (GitHub Copilot, Cursor)
  • Math-specific (Minerva)
  • Domain-specific (medical, legal)

AI Agents

  • LLMs that can use tools
  • Multi-step autonomous tasks
  • Still early but rapidly improving

Open Source Progress

  • Meta's Llama 3, Mistral catching up to commercial models
  • Can run locally on good hardware
  • Privacy and customization benefits

Common Misconceptions

❌ "LLMs understand like humans"

  • ✅ They pattern match incredibly well, but don't have true understanding

❌ "LLMs are always right"

  • ✅ They confidently hallucinate. Always verify important information

❌ "You need to be technical to use them"

  • ✅ Anyone can use them effectively with basic prompting skills

❌ "LLMs will replace programmers/writers/etc"

  • ✅ They augment professionals, making them more productive

❌ "Bigger models are always better"

  • ✅ Bigger models are better at complex reasoning, but smaller models can be faster and cheaper for simple tasks

❌ "LLMs have secret agendas"

  • ✅ They predict text patterns. No consciousness, goals, or desires

Summary

Key Takeaways:

  1. LLMs predict text based on patterns in training data
  2. They're extremely capable but have clear limitations
  3. Major providers: OpenAI (GPT), Anthropic (Claude), Google (Gemini)
  4. Know the key terms: context window, tokens, temperature, hallucination
  5. Modern LLMs are multimodal: text, images, code
  6. Privacy matters: understand how providers handle your data
  7. The field moves fast, so what's true today may change in months

Next Steps:

  • Move to Chapter 02 to learn effective prompting
  • Sign up for at least one LLM service
  • Experiment with simple prompts
  • Read the provider's documentation

Further Reading