Python Quickstart | Hugging Face Tutorial

This is the fastest way to start using Hugging Face in real code.

Install the Main Packages

pip install -U transformers datasets tokenizers evaluate accelerate peft huggingface_hub

Optional extras depend on your workflow:

torch for most PyTorch model usage
tensorflow if using TensorFlow-backed models
sentencepiece for some tokenizers
bitsandbytes for quantization workflows

The Simplest Useful Example: Pipelines

Pipelines are high-level task wrappers.

from transformers import pipeline

classifier = pipeline("sentiment-analysis")
print(classifier("This library saves a huge amount of setup time."))

Why this is useful:

Minimal code
Good for fast validation
Lets you test tasks before building custom code

Text Generation Example

from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")
result = generator("Three practical uses for Hugging Face are", max_new_tokens=40)
print(result[0]["generated_text"])

Using Auto Classes

When you need more control, use AutoTokenizer and AutoModel... classes.

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "distilbert/distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

This pattern is common because it loads the right config automatically for the chosen repo.

Tokenize Input

inputs = tokenizer(
    ["I love practical libraries.", "This output is disappointing."],
    padding=True,
    truncation=True,
    return_tensors="pt",
)

Important options:

padding=True for batching
truncation=True to avoid overlong input errors
return_tensors="pt" for PyTorch tensors

Run Inference

import torch

with torch.no_grad():
    outputs = model(**inputs)
    predictions = outputs.logits.argmax(dim=-1)

print(predictions.tolist())

Using the Hub in Code

from huggingface_hub import login

# Better: set HF_TOKEN in your environment and call login(token=...)
# login(token="...")

For scripts and servers, prefer environment variables over interactive login.

Caching

Downloaded models are cached locally.

This means:

First load is slower
Later loads are faster
Disk usage grows over time

That is normal.

Device Placement

You may run on:

CPU for simple testing
GPU for larger or faster workloads
Specialized accelerators in managed environments

Rule of thumb: Start on CPU with a small model. Move to GPU only when the workload justifies it.

Common Beginner Pattern

Use this sequence:

Start with a pipeline
Prove the task is viable
Switch to auto classes for more control
Add batching, metrics, and evaluation
Fine-tune only if zero-shot or prompting is not enough

Frequent Errors

Error	Usually Means
Tokenizer mismatch	Wrong tokenizer/model pair
CUDA out of memory	Model too large or batch size too high
Gated repo access denied	You need login or license approval
Shape mismatch	Wrong model class for the task
Slow inference	Model too large, CPU-only run, or no batching