Transformers Library
The Hugging Face Transformers library is the most widely used machine learning library in the world — 200k+ GitHub stars, used by every major AI lab. It provides a unified Python API to load, run, fine-tune, and deploy models across text, vision, audio, and multimodal tasks.
Install
pip install transformers # CPU only pip install transformers torch # + PyTorch (GPU) pip install transformers[torch] # + PyTorch dependencies
Pipeline API — 3-Line Inference
The pipeline() function is the highest-level interface. It bundles tokenization, model loading, forward pass, and output decoding into a single call. Most common tasks are supported.
What pipeline() does under the hood in 3 user-facing lines
from transformers import pipeline
# Text generation
gen = pipeline("text-generation", model="meta-llama/Llama-3.2-3B-Instruct")
print(gen("The key to good code is")[0]["generated_text"])
# Summarization
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
print(summarizer("Long article text here...", max_length=130)[0]["summary_text"])
# Sentiment analysis (uses default model if none specified)
classifier = pipeline("sentiment-analysis")
print(classifier("This product is amazing!"))
# [{'label': 'POSITIVE', 'score': 0.9998}]
# Zero-shot classification — no fine-tuning needed
zsc = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
print(zsc("I want to book a flight", candidate_labels=["travel", "food", "sports"]))
# Speech to text
asr = pipeline("automatic-speech-recognition", model="openai/whisper-large-v3")
print(asr("audio.mp3")["text"])
# Image classification
img_clf = pipeline("image-classification", model="google/vit-base-patch16-224")
print(img_clf("cat.jpg"))Supported Tasks
AutoClasses — Model-Agnostic Loading
For more control than pipeline(), use AutoClasses. They detect the model architecture automatically from the Hub config and load the right class:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "mistralai/Mistral-7B-Instruct-v0.3"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16, # half precision — uses half the VRAM
device_map="auto", # spread across available GPUs/CPU
)
inputs = tokenizer("Explain RAG briefly:", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))Trainer — Fine-Tuning
The Trainer class handles the full fine-tuning loop: batching, gradient accumulation, mixed precision, checkpointing, evaluation, and distributed training. Combine with the datasets library to fine-tune on any Hub dataset:
from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
dataset = load_dataset("imdb")
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=16,
evaluation_strategy="epoch",
fp16=True, # mixed precision
)
trainer = Trainer(model=model, args=training_args,
train_dataset=dataset["train"], eval_dataset=dataset["test"])
trainer.train()PEFT — Parameter-Efficient Fine-Tuning
Fine-tuning a full 7B model requires 80+ GB VRAM. PEFT (Parameter-Efficient Fine-Tuning) techniques like LoRA and QLoRA let you fine-tune on consumer hardware by updating only a small set of adapter weights:
from peft import get_peft_model, LoraConfig, TaskType
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=16, # rank — lower = fewer params updated
lora_alpha=32,
lora_dropout=0.1,
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06Checklist: Do You Understand This?
- Can you use
pipeline()to run summarization, classification, and speech recognition? - Do you know when to use
pipeline()vs AutoClasses + manual inference? - Can you set up a basic fine-tuning run with
Trainer? - Do you understand what LoRA does and why it's used for fine-tuning large models?
- Do you know what
device_map="auto"does?