Python Quickstart

This is the fastest way to start using Hugging Face in real code.

Install the Main Packages

pip install -U transformers datasets tokenizers evaluate accelerate peft huggingface_hub

Optional extras depend on your workflow:

  • torch for most PyTorch model usage
  • tensorflow if using TensorFlow-backed models
  • sentencepiece for some tokenizers
  • bitsandbytes for quantization workflows

The Simplest Useful Example: Pipelines

Pipelines are high-level task wrappers.

from transformers import pipeline

classifier = pipeline("sentiment-analysis")
print(classifier("This library saves a huge amount of setup time."))

Why this is useful:

  • Minimal code
  • Good for fast validation
  • Lets you test tasks before building custom code

Text Generation Example

from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")
result = generator("Three practical uses for Hugging Face are", max_new_tokens=40)
print(result[0]["generated_text"])

Using Auto Classes

When you need more control, use AutoTokenizer and AutoModel... classes.

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "distilbert/distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

This pattern is common because it loads the right config automatically for the chosen repo.

Tokenize Input

inputs = tokenizer(
    ["I love practical libraries.", "This output is disappointing."],
    padding=True,
    truncation=True,
    return_tensors="pt",
)

Important options:

  • padding=True for batching
  • truncation=True to avoid overlong input errors
  • return_tensors="pt" for PyTorch tensors

Run Inference

import torch

with torch.no_grad():
    outputs = model(**inputs)
    predictions = outputs.logits.argmax(dim=-1)

print(predictions.tolist())

Using the Hub in Code

from huggingface_hub import login

# Better: set HF_TOKEN in your environment and call login(token=...)
# login(token="...")

For scripts and servers, prefer environment variables over interactive login.

Caching

Downloaded models are cached locally.

This means:

  • First load is slower
  • Later loads are faster
  • Disk usage grows over time

That is normal.

Device Placement

You may run on:

  • CPU for simple testing
  • GPU for larger or faster workloads
  • Specialized accelerators in managed environments

Rule of thumb: Start on CPU with a small model. Move to GPU only when the workload justifies it.

Common Beginner Pattern

Use this sequence:

  1. Start with a pipeline
  2. Prove the task is viable
  3. Switch to auto classes for more control
  4. Add batching, metrics, and evaluation
  5. Fine-tune only if zero-shot or prompting is not enough

Frequent Errors

ErrorUsually Means
Tokenizer mismatchWrong tokenizer/model pair
CUDA out of memoryModel too large or batch size too high
Gated repo access deniedYou need login or license approval
Shape mismatchWrong model class for the task
Slow inferenceModel too large, CPU-only run, or no batching