Python Quickstart
This is the fastest way to start using Hugging Face in real code.
Install the Main Packages
pip install -U transformers datasets tokenizers evaluate accelerate peft huggingface_hub
Optional extras depend on your workflow:
torchfor most PyTorch model usagetensorflowif using TensorFlow-backed modelssentencepiecefor some tokenizersbitsandbytesfor quantization workflows
The Simplest Useful Example: Pipelines
Pipelines are high-level task wrappers.
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
print(classifier("This library saves a huge amount of setup time."))
Why this is useful:
- Minimal code
- Good for fast validation
- Lets you test tasks before building custom code
Text Generation Example
from transformers import pipeline
generator = pipeline("text-generation", model="gpt2")
result = generator("Three practical uses for Hugging Face are", max_new_tokens=40)
print(result[0]["generated_text"])
Using Auto Classes
When you need more control, use AutoTokenizer and AutoModel... classes.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_id = "distilbert/distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
This pattern is common because it loads the right config automatically for the chosen repo.
Tokenize Input
inputs = tokenizer(
["I love practical libraries.", "This output is disappointing."],
padding=True,
truncation=True,
return_tensors="pt",
)
Important options:
padding=Truefor batchingtruncation=Trueto avoid overlong input errorsreturn_tensors="pt"for PyTorch tensors
Run Inference
import torch
with torch.no_grad():
outputs = model(**inputs)
predictions = outputs.logits.argmax(dim=-1)
print(predictions.tolist())
Using the Hub in Code
from huggingface_hub import login
# Better: set HF_TOKEN in your environment and call login(token=...)
# login(token="...")
For scripts and servers, prefer environment variables over interactive login.
Caching
Downloaded models are cached locally.
This means:
- First load is slower
- Later loads are faster
- Disk usage grows over time
That is normal.
Device Placement
You may run on:
- CPU for simple testing
- GPU for larger or faster workloads
- Specialized accelerators in managed environments
Rule of thumb: Start on CPU with a small model. Move to GPU only when the workload justifies it.
Common Beginner Pattern
Use this sequence:
- Start with a pipeline
- Prove the task is viable
- Switch to auto classes for more control
- Add batching, metrics, and evaluation
- Fine-tune only if zero-shot or prompting is not enough
Frequent Errors
| Error | Usually Means |
|---|---|
| Tokenizer mismatch | Wrong tokenizer/model pair |
| CUDA out of memory | Model too large or batch size too high |
| Gated repo access denied | You need login or license approval |
| Shape mismatch | Wrong model class for the task |
| Slow inference | Model too large, CPU-only run, or no batching |