Guide to Hugging Face AutoModelFor** Classes and Tokenizers

Understanding SentenceTransformer Vs AutoTokenizer + AutoModel

A tokenizer such as AutoTokenizer simply converts the words into tokens ( A numerical representation of text) however this alone doesnt produce sentence embeddings

Sentencetransformer() does both tokenization and embedding computations automatically it also applies pooling(typically mean pooling) to hidden states resulting a final sentence embedding that can be directly used for various NLP tasks

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

sentences = ["I love machine learning", "I am expert in AI"]
embeddings = model.encode(sentences)

print(embeddings.shape) # (2, 384) dim

However if we want to do fine tuning we need to understand the usage of AutoTokenizer+AutoModel()

Hugging Face’s Transformers library has become the standard toolkit for modern NLP, vision, speech, and multimodal AI. One of its most powerful features is the Auto API — especially the AutoModelFor** classes and tokenizers — which let you load state-of-the-art models with minimal boilerplate.

In this article, we’ll walk through:

  • What AutoModelFor** classes are
  • All major AutoModelFor* variants and what they’re used for
  • Tokenizers and how they fit into the workflow
  • Practical examples and real-world use cases

Why the “Auto” API Exists

Before the Auto API, you had to know exactly which model class to load:

from transformers import BertForSequenceClassification

This tightly couples your code to a specific architecture (BERT, RoBERTa, etc.).

The Auto API solves this by:

  • Automatically detecting the architecture from the model checkpoint
  • Loading the correct configuration, tokenizer, and model class
  • Making your code architecture-agnostic
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")

we will look for AutoModel APIs for each use cases in NLP, Vision and Audio

AutoModelsFor** in NLP

In NLP, different tasks require different output heads:

  • Classification → logits
  • Token labeling → per-token predictions
  • Generation → autoregressive decoding
  • QA → span prediction

The AutoModelFor** API ensures:

  • The correct head is attached
  • The model configuration is respected
  • You don’t hardcode architecture-specific logic

This allows you to switch between BERT, RoBERTa, DeBERTa, GPT, T5, and more without rewriting code.

The Role of Tokenizers

Before any model can process text, it must be converted into numbers. That’s the tokenizer’s job.

AutoTokenizer

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

What it does:

  • Splits text into tokens (subwords or characters)
  • Maps tokens to integer IDs
  • Handles padding, truncation, attention masks, and special tokens

Used in:

  • NLP (text, QA, summarization, translation)
  • Multimodal models (text + image/audio)
  • Training and inference pipelines

What AutoTokenizer Handles

  • Subword tokenization (WordPiece, BPE, Unigram)
  • Input IDs
  • Attention masks
  • Token type IDs (for sentence pairs)
  • Padding & truncation
  • Special tokens ([CLS], [SEP], <s>, etc.)
### Usage Pattern
inputs = tokenizer(
"Transformers are amazing",
padding=True,
truncation=True,
return_tensors="pt"
)

Core AutoModelFor** Classes (NLP)

  1. AutoModel

Purpose: Base model without a task-specific head.

AutoModel loads the base transformer without any task-specific head.

from transformers import AutoModel
model = AutoModel.from_pretrained("bert-base-uncased")

Outputs

  • Hidden states for each token
  • Pooled output (if available)

When to Use It

  • Feature extraction
  • Sentence embeddings
  • Custom heads
  • Research experiments

Example Use Case

Building a semantic search engine where you compute embeddings and store them in a vector database.

outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1)

2. AutoModelForSequenceClassification

Purpose: Classify an entire sequence into one or more categories. It is a pretrained model for classification + a classification head on top. Given a piece of text it can predict the label

from transformers import AutoModelForSequenceClassification

Internals

  • Uses [CLS] token (or equivalent)
  • Adds a linear classification head
  • Supports multi-class & multi-label setups

Typical Tasks

  • Sentiment analysis
  • Topic classification
  • Toxicity detection
  • Intent recognition
  • Spam Detection

Example Use Case

Customer reviews

“This product is amazing!” → Positive
“Worst experience ever.” → Negative

Step1: Import the model

from transformers import (
AutoTokenizer,
AutoModelForSequenceClassification,
Trainer,
TrainingArguments
)
from datasets import Dataset
import torch

Step2: Load Pretrained Model and Tokenizer

model_name = "bert-base-uncased"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(
model_name,
num_labels=2
)

Step3: Prepare your dataset

data = {
"text": [
"I love this product",
"This is terrible",
"Amazing experience",
"Worst purchase ever"
],
"label": [1, 0, 1, 0]
}

dataset = Dataset.from_dict(data)

Step4: Tokenize the Data

def tokenize_function(example):
return tokenizer(
example["text"],
padding="max_length",
truncation=True,
max_length=128
)

tokenized_dataset = dataset.map(tokenize_function)

Step5: Set Training Arguments

training_args = TrainingArguments(
output_dir="./results",
evaluation_strategy="no",
learning_rate=2e-5,
per_device_train_batch_size=8,
num_train_epochs=3,
weight_decay=0.01,
logging_steps=10
)

usually we fine tune for 2–5 epochs not hundreds

Step6: Fine Tuning

trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset
)

trainer.train()

Step7: Save the model

model.save_pretrained("./sentiment-model")
tokenizer.save_pretrained("./sentiment-model")

Step8: Run Inference:

text = "I really enjoyed this movie"

inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)

logits = outputs.logits
prediction = torch.argmax(logits, dim=1).item()
if prediction == 1:
print("Positive sentiment")
else:
print("Negative sentiment")

2. AutoModelForTokenClassification

If AutoModelForSequenceClassification answers:

“What is this text about?”

Then AutoModelForTokenClassification answers:

“What does each word in this text represent?”

This model is used when labels belong to tokens, not the entire sentence.

AutoModelForTokenClassification pretrained language model (BERT, RoBERTa, etc.) with a token-level classification head.

Each token gets its own prediction.

Key Difference from Sequence Classification

| Sequence Classification         | Token Classification                    |
| ------------------------------- | --------------------------------------- |
| One label per sentence | One label per token |
| Uses `[CLS]` token | Uses all token embeddings |
| Output shape: `(batch, labels)` | Output shape: `(batch, tokens, labels)` |

Most Common Tasks

  • Named Entity Recognition (NER)
  • Part-of-Speech tagging (POS)
  • Slot filling (chatbots)
  • Medical & legal entity extraction

Real-World Examples

  • Extract names, dates, locations from contracts
  • Parse resumes into structured fields
  • Identify medical terms in clinical notes

Example: Input Sentence

John works at Google in California

Desired Output:

| Token      | Label  |
| ---------- | ------ |
| John | PERSON |
| works | O |
| at | O |
| Google | ORG |
| in | O |
| California | LOC |

fine-tuning pipeline is the same as sequence classification
(Tokenizer → Dataset → Trainer → Train → Save → Inference)

3. AutoModelForQuestionAnswering

AutoModelForQuestionAnswering is used when you want the model to find an answer inside a given document.

This is very different from:

  • classification (choosing a label)
  • generation (writing new text)

Here, the model extracts an answer span from existing text.

It is a pretrained language model with a span-prediction head on top

from transformers import AutoModelForQuestionAnswering

Instead of predicting labels or tokens, it predicts:

  • where the answer starts
  • where the answer ends

The Core Idea (Simple Explanation)

You always give the model two things:

  1. A question
  2. A context (document / paragraph)

The model answers by pointing to a part of the context.

Fine-Tuning: What Stays the Same

🔁 Fine-tuning steps are the same as before:

  • Load pretrained model
  • Prepare dataset
  • Tokenize
  • Use Trainer
  • Train and save

👉 Refer back to AutoModelForSequenceClassification steps.

Training Labels Are Not Classes

Instead of labels like positive / negative, you provide:

  • start_position
  • end_position

Example of Training Sample:

{
"question": "When were Transformers introduced?",
"context": "Transformers were introduced in 2017 by Vaswani et al.",
"start_position": 31,
"end_position": 35
}
Training Labels Are Not Classes

Instead of labels like positive / negative, you provide:

start_position

end_position

During training, the model learns:

  • How to map questions to answer spans
  • How to ignore irrelevant text
  • How to pick the most confident answer window
 #### Load the Model
from transformers import AutoTokenizer, AutoModelForQuestionAnswering

model_name = "bert-base-uncased"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)

Example Training Data

train_data = {
"question": [
"When were Transformers introduced?"
],
"context": [
"Transformers were introduced in 2017 by Vaswani et al."
],
"answers": [
{"text": ["2017"], "answer_start": [31]}
]
}
from datasets import Dataset

dataset = Dataset.from_dict(train_data)

def preprocess(example):
inputs = tokenizer(
example["question"],
example["context"],
truncation=True,
padding="max_length",
max_length=128
)

start = example["answers"]["answer_start"][0]
end = start + len(example["answers"]["text"][0])

inputs["start_positions"] = inputs.char_to_token(1, start)
inputs["end_positions"] = inputs.char_to_token(1, end - 1)

return inputs

tokenized_dataset = dataset.map(preprocess)

Fine Tuning

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
output_dir="./qa_model",
per_device_train_batch_size=2,
num_train_epochs=2,
logging_steps=10
)

trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset
)

trainer.train()

Inference

import torch

tokenizer = AutoTokenizer.from_pretrained("./qa_model")
model = AutoModelForQuestionAnswering.from_pretrained("./qa_model")

question = "Who introduced Transformers?"
context = "Transformers were introduced in 2017 by Vaswani et al."

inputs = tokenizer(question, context, return_tensors="pt")

with torch.no_grad():
outputs = model(**inputs)

start = torch.argmax(outputs.start_logits)
end = torch.argmax(outputs.end_logits)

answer_ids = inputs["input_ids"][0][start:end+1]
answer = tokenizer.decode(answer_ids)

print(answer)

### Output
Vaswani et al.

4. AutoModelForCausalLM

Fine-Tuning & Inference Explained (Text Generation)

AutoModelForCausalLM is the class you use when you want a model to generate text.

If AutoModelForQuestionAnswering:

finds answers in text

Then AutoModelForCausalLM:

writes new text

This is the backbone behind chatbots, assistants, code generators, and story writers.

AutoModelForCausalLM is a pretrained autoregressive language model that predicts the next token given previous tokens

The model learns:

“Given everything so far, what word comes next?”

Use AutoModelForCausalLM if you want:

  • Free-form text generation
  • Conversational chatbots
  • Code completion
  • Creative writing
  • Instruction-following systems

Input prompt

User: Explain transformers in simple words.
Assistant:Transformers are models that understand text by paying attention to words...

How Fine-Tuning Works (High Level)

  • Load pretrained model
  • Prepare dataset
  • Tokenize
  • Trainer → Train → Save

👉 Refer to earlier fine-tuning steps.

Example Training Data

Prompt: What is NLP?
Answer: NLP is a field of AI that focuses on language.
### Load Model
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "gpt2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(model_name)

### Example of Training Data
from datasets import Dataset

data = {
"text": [
"User: What is AI?nAssistant: AI is the simulation of human intelligence.",
"User: What is NLP?nAssistant: NLP helps machines understand language."
]
}

dataset = Dataset.from_dict(data)

### Tokenize
def tokenize(example):
tokens = tokenizer(
example["text"],
truncation=True,
padding="max_length",
max_length=128
)
tokens["labels"] = tokens["input_ids"].copy()
return tokens

tokenized_dataset = dataset.map(tokenize)
### fine Tuning
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
output_dir="./causal_lm",
per_device_train_batch_size=2,
num_train_epochs=2,
logging_steps=10
)

trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset
)

trainer.train()

### Generate Text
prompt = "User: Explain machine learningnAssistant:"

inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=50,
do_sample=True,
temperature=0.7
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

5. AutoModelForSeq2SeqLM

AutoModelForSeq2SeqLM is used when both your input and output are text, but the output is different from the input.

If AutoModelForCausalLM:

continues text

Then AutoModelForSeq2SeqLM:

transforms text

AutoModelForSeq2SeqLM It is a pretrained encoder–decoder model designed for text-to-text generation.

Popular Models

  • T5 — text-to-text framework
  • BART — summarization & generation
  • mBART — multilingual translation
  • Pegasus — abstractive summarization

Core Idea

“Read the input text → understand it → generate a new text.

The model:

  1. Encodes the input text
  2. Decodes a new output sequence using attention

Use AutoModelForSeq2SeqLM if your task is:

  • Summarization
  • Translation
  • Paraphrasing
  • Question generation
  • Grammar correction

Use Case

 ### Input
A long news article...
### output
A short summary of the article.

Example of a Training data

Input:  "Summarize: Transformers revolutionized NLP..."
Target: "Transformers changed NLP."

Fine tuning pipeline

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "facebook/bart-large-cnn"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

### example dataset
from datasets import Dataset

data = {
"input_text": [
"Transformers revolutionized NLP by enabling attention mechanisms..."
],
"target_text": [
"Transformers changed NLP using attention."
]
}

dataset = Dataset.from_dict(data)

### Tokenizer
def preprocess(example):
model_inputs = tokenizer(
example["input_text"],
truncation=True,
padding="max_length",
max_length=128
)

with tokenizer.as_target_tokenizer():
labels = tokenizer(
example["target_text"],
truncation=True,
padding="max_length",
max_length=64
)

model_inputs["labels"] = labels["input_ids"]
return model_inputs

tokenized_dataset = dataset.map(preprocess)

### Fine tuning
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
output_dir="./seq2seq_model",
per_device_train_batch_size=2,
num_train_epochs=2,
logging_steps=10
)

trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset
)

trainer.train()

### inference
import torch

tokenizer = AutoTokenizer.from_pretrained("./seq2seq_model")
model = AutoModelForSeq2SeqLM.from_pretrained("./seq2seq_model")
### Generate Text
text = "Transformers revolutionized NLP by enabling attention mechanisms."

inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=40
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

NLP AutoModels Comparison Table

| AutoModel Class                        | Input              | Output         | What It Does                    | Typical Use Cases                       |
| -------------------------------------- | ------------------ | -------------- | ------------------------------- | --------------------------------------- |
| **AutoModel** | Text | Embeddings | Returns hidden states only | Semantic search, similarity, clustering |
| **AutoModelForSequenceClassification** | Text | Label(s) | One label per sentence | Sentiment, spam, intent detection |
| **AutoModelForTokenClassification** | Text | Token labels | One label per word | NER, POS tagging, entity extraction |
| **AutoModelForQuestionAnswering** | Question + Context | Text span | Extracts answer from context | Document QA, search, legal analysis |
| **AutoModelForMaskedLM** | Text with `[MASK]` | Missing token | Predicts masked words | Pretraining, fill-in-the-blank |
| **AutoModelForCausalLM** | Prompt | Generated text | Continues text autoregressively | Chatbots, code gen, writing |
| **AutoModelForSeq2SeqLM** | Text | New text | Transforms text | Summarization, translation |

Real world use case mapping

| Real Problem                           | Correct AutoModel                  |
| -------------------------------------- | ---------------------------------- |
| “Is this review positive or negative?” | AutoModelForSequenceClassification |
| “Extract names & locations from text” | AutoModelForTokenClassification |
| “Answer questions from a PDF” | AutoModelForQuestionAnswering |
| “Build a chatbot” | AutoModelForCausalLM |
| “Summarize this article” | AutoModelForSeq2SeqLM |
| “Find similar documents” | AutoModel |

Other AutoModel Reads

a. AutoModelForPreTraining : AutoModelForPreTraining loads a model exactly as it was pretrained, with all pretraining heads included.

Use AutoModelForPreTraining if:

  • You are continuing pretraining
  • You are doing research
  • You want access to all pretraining losses
  • You are training on domain-specific corpora (legal, medical)

Why Most Users Don’t Need It

For most applications:

  • Fine-tuning uses task-specific models
  • Pretraining heads are unnecessary
  • Training is expensive

AutoConfig:

AutoConfig loads only the model configuration, not weights.

What’s Inside a Config?

  • Number of layers
  • Hidden size
  • Attention heads
  • Dropout
  • Vocabulary size
  • Architecture type

It is used to inspect the model or Create Models from ScratchUseful for research and experimentation.

config = AutoConfig.from_pretrained("bert-base-uncased")
print(config.hidden_size)

### Modify the Architecture
config.num_hidden_layers = 6
model = AutoModel.from_config(config)

You might have seen and wondering what is TfAutoModel its nothing but model trained specific to tensor flow library it returns tensorflow ‘tf’ instead of pytorch ‘pt’

What Are TfAutoModel Classes?

What Does “Tf” Mean?

Tf = TensorFlow

Hugging Face supports both:

  • PyTorch (AutoModel)
from transformers import AutoModel
model = AutoModel.from_pretrained("bert-base-uncased")
  • TensorFlow / Keras (TFAutoModel)
from transformers import TFAutoModel
model = TFAutoModel.from_pretrained("bert-base-uncased")

so every PyTorch AutoModel has a TensorFlow equivalent:

| PyTorch                            | TensorFlow                           |
| ---------------------------------- | ------------------------------------ |
| AutoModel | TFAutoModel |
| AutoModelForSequenceClassification | TFAutoModelForSequenceClassification |
| AutoModelForTokenClassification | TFAutoModelForTokenClassification |
| AutoModelForQuestionAnswering | TFAutoModelForQuestionAnswering |
| AutoModelForCausalLM | TFAutoModelForCausalLM |
| AutoModelForSeq2SeqLM | TFAutoModelForSeq2SeqLM |

We will continue Vision and Audio AutoModels details and fine tuning in next article


Guide to Hugging Face AutoModelFor** Classes and Tokenizers was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Liked Liked