Reference
Use this as a fast refresher after reading the tutorial.
Glossary
| Term | Meaning |
|---|---|
| Hub | Central repository platform for models, datasets, and Spaces |
| Model card | README-like documentation for a model |
| Dataset card | Documentation for a dataset |
| Space | Hosted demo app |
| Pipeline | High-level inference wrapper in transformers |
| Tokenizer | Converts raw input into model-readable tokens |
| Checkpoint | Saved model weights/state |
| PEFT | Parameter-efficient fine-tuning |
| LoRA | A common PEFT method for adapting large models cheaply |
| Revision | A specific version of a repo, branch, tag, or commit |
| Gated model | Requires approval or terms acceptance before access |
What to Learn First
If you are new, use this order:
- Learn what the ecosystem contains
- Browse models and read model cards
- Run a
pipeline - Learn
AutoTokenizerand auto model classes - Load datasets and compute metrics
- Fine-tune only when needed
- Deploy with a Space or endpoint
Fast Decision Guide
| If You Need To... | Start Here |
|---|---|
| Browse available models | Hub search and model cards |
| Run something in 5 minutes | pipeline() |
| Build a proper Python workflow | transformers + datasets |
| Adapt a model cheaply | peft / LoRA |
| Share an interactive demo | Space |
| Put a model behind an API | Endpoint or self-hosted service |
| Reuse files programmatically | huggingface_hub |
Common Commands
# Install core packages
pip install -U transformers datasets tokenizers evaluate accelerate peft huggingface_hub
# Log in
hf auth login
# Download a file from a repo
hf download distilbert/distilbert-base-uncased config.json
Common Python Patterns
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased-finetuned-sst-2-english")
model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased-finetuned-sst-2-english")
from datasets import load_dataset
dataset = load_dataset("imdb")
Common Mistakes
- Choosing a model without reading the license
- Confusing a good demo with production readiness
- Ignoring tokenizer/model compatibility
- Skipping evaluation on real data
- Using fine-tuning when prompting or smaller models would suffice
- Forgetting to pin revisions
A Sensible Learning Roadmap
Day 1
- Read chapters 01-03
- Browse the Hub and shortlist a few interesting repos
- Run one pipeline locally
Day 2
- Read chapters 04-05
- Load one dataset and one model in Python
- Compare two models on a few real inputs
Day 3+
- Read chapters 06-09
- Fine-tune only if you have a clear use case
- Publish a small demo or internal proof of concept