LLM Fundamentals | AI Tutorial

What large language models are, how they work, and the vocabulary you need before the rest of this tutorial makes sense.

What Is an LLM

A large language model is an AI system trained on huge amounts of text. It can:

Read and understand human language
Generate human-like text
Follow instructions
Answer questions and hold conversations

A useful mental model: autocomplete on steroids. The model predicts the most likely next token (a word or fragment) based on patterns learned from billions of examples. Stack enough of those predictions together and you get full responses.

How LLMs Work

LLMs go through two phases.

Training Phase

The model is fed billions of text examples (books, websites, code, transcripts) and learns statistical patterns: grammar, facts, reasoning shapes, style. This takes months on huge clusters of GPUs and costs tens to hundreds of millions of dollars per frontier model.

Usage Phase (What You Do)

You provide a prompt. The model runs your tokens through its layers and predicts the most likely next tokens, one after another, until it stops. This happens in seconds.

The thing to internalise: LLMs do not "know" facts the way you do. They predict statistically likely text given their training. That is why the same model can answer a hard question correctly and then state, with the same confidence, something completely false.

Major Providers (2026)

Provider	Model	Strengths	Best For
OpenAI	GPT-5, GPT-5 mini	Most versatile, biggest ecosystem	General use, coding, integrations
Anthropic	Claude Opus 4.7, Sonnet 4.6, Haiku 4.5	Long context, careful reasoning	Research, analysis, long documents, agents
Google	Gemini 2.5 Pro	Multimodal, fast, integrated with Google	Research, multimedia, Google users
Meta	Llama 3 / 4	Open weights, customizable	Self-hosting, privacy, customization
Mistral	Mistral Large	European, fast, efficient	Europe-based, privacy-conscious

What LLMs Can Do

Text Generation

Writing emails, articles, code, stories
Summarizing long documents
Translating languages
Rewriting content in different styles

Analysis

Extracting key points from text
Answering questions about documents
Comparing and contrasting ideas
Finding patterns in unstructured data

Problem Solving

Breaking down complex problems
Creating step-by-step plans
Debugging code
Suggesting solutions with trade-offs

Structured Output

Generating JSON, CSV, markdown
Creating tables and lists
Following templates precisely

Important Terminology

Context Window

The amount of text an LLM can "see" at once, measured in tokens (roughly 0.75 English words per token).

GPT-5: up to 400K tokens (~300K words)
Claude Opus 4.7: 1M tokens. Sonnet 4.6 and Haiku 4.5: 200K tokens
Gemini 2.5: 1-2M tokens (~750K-1.5M words)

Larger windows let you fit longer documents and more conversation history.

Token

The basic unit LLMs work with. Roughly:

1 token is about 0.75 English words
"Hello world" is 2 tokens
"Artificial Intelligence" is 3 tokens

API pricing is per token. Context limits are in tokens. You will hear the word a lot.

Temperature

Controls randomness of output, usually 0.0 to 2.0:

0.0-0.3: Focused, deterministic, consistent (facts, code, analysis)
0.4-0.7: Balanced (default for most uses)
0.8-2.0: Creative, varied, less predictable (brainstorming, fiction)

Top-p (Nucleus Sampling)

An alternative to temperature. The model only considers the most likely tokens until their combined probability adds up to p.

0.1: Very focused, conservative
0.9: Balanced (the usual default)
0.95: Allows more variety

You can usually ignore this and tune temperature instead.

Fine-tuning

Training an LLM further on specific data to specialize it for a task or domain.

Examples:

Fine-tuning on medical literature for medical Q&A
Training on company docs for customer service
Specializing in legal document analysis

For most users, good prompting gets you 90% of the way there. Fine-tuning is a real engineering project, not a quick win.

RAG (Retrieval-Augmented Generation)

Giving LLMs access to external knowledge by:

Searching a database for relevant info
Including that info in the prompt
Letting the LLM generate a response using that context

RAG is the standard fix for knowledge cutoffs and hallucinations on private data.

Embeddings

Converting text into numbers (vectors) that capture meaning. Similar texts have similar vectors.

Used for search, recommendations, clustering, and powering RAG systems.

How LLMs Are Trained

Pre-training

Train on massive general datasets (internet, books, code). The model learns language patterns, facts, and reasoning shapes. This is months of compute on supercomputers and produces a "base model".

Instruction Tuning

Fine-tune on examples of following instructions. This teaches the base model to be helpful, answer questions, and behave more like an assistant.

RLHF (Reinforcement Learning from Human Feedback)

Humans rate model outputs and the model learns what "good" looks like. This is where most of the safety, helpfulness, and tone improvements happen.

The LLM you talk to has been through all three phases.

What LLMs Are Good At

Excellent

Text generation and rewriting
Summarization
Translation
Code generation and debugging
Brainstorming and ideation
Explaining concepts
Following complex instructions
Pattern matching in text

Good

Math, when paired with code execution
Research and synthesis
Structured data extraction
Template filling
Style mimicry

Limited

Real-time information (knowledge cutoff)
Math without code execution
Counting and precise operations
Long chains of strict logical reasoning
Image understanding for text-only models

Poor

True understanding versus pattern matching
Knowing what they do not know
Consistent logic across very long chains
Physical-world reasoning

Knowledge Cutoff

LLMs are trained on data up to a specific date, called their "knowledge cutoff":

GPT-5: late 2024
Claude Opus 4.7 / Sonnet 4.6: January 2026
Gemini 2.5: mid 2024 (updated periodically)

After their cutoff, the model knows nothing unless you tell it.

Workarounds:

Use models with web search (ChatGPT Plus, Perplexity)
Provide recent information in your prompt
Use RAG systems for current data

Multimodal Capabilities

Modern LLMs handle more than text.

Vision

Upload images and ask questions
Analyze charts, diagrams, screenshots
Extract text from images (OCR)
Describe visual content

Code Execution

Run Python internally
Perform calculations accurately
Analyze data files
Generate plots and visualizations

Audio

Voice conversations
Transcription
Audio analysis

Video

Still early as of 2026. Frame-by-frame analysis works; full video reasoning is improving fast.

Model Sizes

LLMs come in different sizes, measured in parameters (the trainable weights inside the network).

Small (1-10B parameters)

Fast, cheap, can run on a laptop
Good for simple tasks
Limited reasoning

Medium (10-70B parameters)

Balanced performance and cost
Most common API size
Good for most use cases

Large (70B+ parameters)

Best performance
Expensive, slower
Better at complex reasoning

More parameters generally means more capable, but not always. Architecture, training data, and post-training all matter.

Privacy and Security

What Providers Do With Your Data

OpenAI (ChatGPT)

Free tier: Data may be used for training (you can opt out)
Plus, Pro, Enterprise: Not used for training by default
API: Not used for training

Anthropic (Claude)

Never uses conversations for training
Enterprise options for additional security
API: Not used for training

Google (Gemini)

May use data to improve services (you can opt out)
Workspace accounts have different policies

Practical Rules

Do not paste passwords, API keys, PII, or confidential business data into a chat
Use API or enterprise plans for work
Read the privacy policy for your specific use case
Consider self-hosted models for genuinely sensitive work

Costs

Consumer Plans

Free: Limited access, often older models
$20-30/month: Full access to current models, higher rate limits
$200+/month: Pro tiers with even more capacity

API Pricing (Pay Per Use)

Input tokens: $0.15 to $15 per 1M tokens
Output tokens: $0.60 to $75 per 1M tokens
Varies by model size and speed

A worked example. Analyzing a 10-page document (about 5K tokens) with GPT-5:

Input: ~$0.08
Output (2K tokens): ~$0.12
Total: ~$0.20 per analysis

What Has Changed Recently (2024-2026)

Longer context windows: from 8K to 2M+ tokens. Whole books and codebases now fit
Multimodal by default: vision, audio, and code execution come standard on frontier models
Faster and cheaper: same quality as 2023, roughly 10x faster and 10x cheaper
Specialized models: code-focused tools (Copilot, Cursor), math, medical, legal
Agents: LLMs that use tools and run multi-step tasks. Useful but still rough at the edges
Open source caught up: Llama and Mistral are within striking distance of commercial models for many tasks

Common Misconceptions

"LLMs understand like humans." They pattern-match very well. They do not understand in the way you do.

"LLMs are always right." They confidently produce false answers. Verify anything that matters.

"You need to be technical to use them." You do not. Basic prompting skill is enough.

"LLMs will replace programmers, writers, and so on." They augment people. The good ones get a lot more done.

"Bigger models are always better." Bigger models are better at hard reasoning. Smaller models are faster and cheaper for easy tasks.

"LLMs have hidden agendas." They predict text. There is no consciousness, goal, or desire underneath.

Next Steps

Continue to 02-prompting-basics.md to learn how to write prompts that actually work. Before you do, sign up for at least one LLM service (ChatGPT, Claude, or Gemini) and try a simple prompt. Reading about prompting without doing it is like reading about cycling.