Guide to Hugging Face AutoModelFor** Classes and Tokenizers

digitado ⋅ 11 de January de 2026

Understanding SentenceTransformer Vs AutoTokenizer + AutoModel

A tokenizer such as AutoTokenizer simply converts the words into tokens ( A numerical representation of text) however this alone doesnt produce sentence embeddings

Sentencetransformer() does both tokenization and embedding computations automatically it also applies pooling(typically mean pooling) to hidden states resulting a final sentence embedding that can be directly used for various NLP tasks

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

sentences = ["I love machine learning", "I am expert in AI"]
embeddings = model.encode(sentences) 

print(embeddings.shape)  # (2, 384) dim

However if we want to do fine tuning we need to understand the usage of AutoTokenizer+AutoModel()

Hugging Face’s Transformers library has become the standard toolkit for modern NLP, vision, speech, and multimodal AI. One of its most powerful features is the Auto API — especially the AutoModelFor** classes and tokenizers — which let you load state-of-the-art models with minimal boilerplate.

In this article, we’ll walk through:

What AutoModelFor** classes are
All major AutoModelFor* variants and what they’re used for
Tokenizers and how they fit into the workflow
Practical examples and real-world use cases

Why the “Auto” API Exists

Before the Auto API, you had to know exactly which model class to load:

from transformers import BertForSequenceClassification

This tightly couples your code to a specific architecture (BERT, RoBERTa, etc.).

The Auto API solves this by:

Automatically detecting the architecture from the model checkpoint
Loading the correct configuration, tokenizer, and model class
Making your code architecture-agnostic

from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")

we will look for AutoModel APIs for each use cases in NLP, Vision and Audio

AutoModelsFor** in NLP

In NLP, different tasks require different output heads:

Classification → logits
Token labeling → per-token predictions
Generation → autoregressive decoding
QA → span prediction

The AutoModelFor** API ensures:

The correct head is attached
The model configuration is respected
You don’t hardcode architecture-specific logic

This allows you to switch between BERT, RoBERTa, DeBERTa, GPT, T5, and more without rewriting code.

The Role of Tokenizers

Before any model can process text, it must be converted into numbers. That’s the tokenizer’s job.

AutoTokenizer

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

What it does:

Splits text into tokens (subwords or characters)
Maps tokens to integer IDs
Handles padding, truncation, attention masks, and special tokens

Used in:

NLP (text, QA, summarization, translation)
Multimodal models (text + image/audio)
Training and inference pipelines

What AutoTokenizer Handles

Subword tokenization (WordPiece, BPE, Unigram)
Input IDs
Attention masks
Token type IDs (for sentence pairs)
Padding & truncation
Special tokens ([CLS], [SEP], <s>, etc.)

### Usage Pattern
inputs = tokenizer(
    "Transformers are amazing",
    padding=True,
    truncation=True,
    return_tensors="pt"
)

Core AutoModelFor** Classes (NLP)

AutoModel

Purpose: Base model without a task-specific head.

AutoModel loads the base transformer without any task-specific head.

from transformers import AutoModel
model = AutoModel.from_pretrained("bert-base-uncased")

Outputs

Hidden states for each token
Pooled output (if available)

When to Use It

Feature extraction
Sentence embeddings
Custom heads
Research experiments

Example Use Case

Building a semantic search engine where you compute embeddings and store them in a vector database.

outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1)

2. AutoModelForSequenceClassification

Purpose: Classify an entire sequence into one or more categories. It is a pretrained model for classification + a classification head on top. Given a piece of text it can predict the label

from transformers import AutoModelForSequenceClassification

Internals

Uses [CLS] token (or equivalent)
Adds a linear classification head
Supports multi-class & multi-label setups

Typical Tasks

Sentiment analysis
Topic classification
Toxicity detection
Intent recognition
Spam Detection

Example Use Case

Customer reviews

“This product is amazing!” → Positive
“Worst experience ever.” → Negative

Step1: Import the model

from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    Trainer,
    TrainingArguments
)
from datasets import Dataset
import torch

Step2: Load Pretrained Model and Tokenizer

model_name = "bert-base-uncased"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=2
)

Step3: Prepare your dataset

data = {
    "text": [
        "I love this product",
        "This is terrible",
        "Amazing experience",
        "Worst purchase ever"
    ],
    "label": [1, 0, 1, 0]
}

dataset = Dataset.from_dict(data)

Step4: Tokenize the Data

def tokenize_function(example):
    return tokenizer(
        example["text"],
        padding="max_length",
        truncation=True,
        max_length=128
    )

tokenized_dataset = dataset.map(tokenize_function)

Step5: Set Training Arguments

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="no",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_steps=10
)

usually we fine tune for 2–5 epochs not hundreds

Step6: Fine Tuning

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset
)

trainer.train()

Step7: Save the model

model.save_pretrained("./sentiment-model")
tokenizer.save_pretrained("./sentiment-model")

Step8: Run Inference:

text = "I really enjoyed this movie"

inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

logits = outputs.logits
prediction = torch.argmax(logits, dim=1).item()
if prediction == 1:
    print("Positive sentiment")
else:
    print("Negative sentiment")

2. AutoModelForTokenClassification

If AutoModelForSequenceClassification answers:

“What is this text about?”

Then AutoModelForTokenClassification answers:

“What does each word in this text represent?”

This model is used when labels belong to tokens, not the entire sentence.

AutoModelForTokenClassification pretrained language model (BERT, RoBERTa, etc.) with a token-level classification head.

Each token gets its own prediction.

Key Difference from Sequence Classification

| Sequence Classification         | Token Classification                    |
| ------------------------------- | --------------------------------------- |
| One label per sentence          | One label per token                     |
| Uses `[CLS]` token              | Uses all token embeddings               |
| Output shape: `(batch, labels)` | Output shape: `(batch, tokens, labels)` |

Most Common Tasks

Named Entity Recognition (NER)
Part-of-Speech tagging (POS)
Slot filling (chatbots)
Medical & legal entity extraction

Real-World Examples

Extract names, dates, locations from contracts
Parse resumes into structured fields
Identify medical terms in clinical notes

Example: Input Sentence

John works at Google in California

Desired Output:

| Token      | Label  |
| ---------- | ------ |
| John       | PERSON |
| works      | O      |
| at         | O      |
| Google     | ORG    |
| in         | O      |
| California | LOC    |

fine-tuning pipeline is the same as sequence classification
(Tokenizer → Dataset → Trainer → Train → Save → Inference)

3. AutoModelForQuestionAnswering

AutoModelForQuestionAnswering is used when you want the model to find an answer inside a given document.

This is very different from:

classification (choosing a label)
generation (writing new text)

Here, the model extracts an answer span from existing text.

It is a pretrained language model with a span-prediction head on top

from transformers import AutoModelForQuestionAnswering

Instead of predicting labels or tokens, it predicts:

where the answer starts
where the answer ends

The Core Idea (Simple Explanation)

You always give the model two things:

A question
A context (document / paragraph)

The model answers by pointing to a part of the context.

Fine-Tuning: What Stays the Same

🔁 Fine-tuning steps are the same as before:

Load pretrained model
Prepare dataset
Tokenize
Use Trainer
Train and save

👉 Refer back to AutoModelForSequenceClassification steps.

Training Labels Are Not Classes

Instead of labels like positive / negative, you provide:

start_position
end_position

Example of Training Sample:

{
  "question": "When were Transformers introduced?",
  "context": "Transformers were introduced in 2017 by Vaswani et al.",
  "start_position": 31,
  "end_position": 35
}

Training Labels Are Not Classes

Instead of labels like positive / negative, you provide:

start_position

end_position

During training, the model learns:

How to map questions to answer spans
How to ignore irrelevant text
How to pick the most confident answer window

 #### Load the Model
from transformers import AutoTokenizer, AutoModelForQuestionAnswering

model_name = "bert-base-uncased"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)

Example Training Data

train_data = {
    "question": [
        "When were Transformers introduced?"
    ],
    "context": [
        "Transformers were introduced in 2017 by Vaswani et al."
    ],
    "answers": [
        {"text": ["2017"], "answer_start": [31]}
    ]
}

from datasets import Dataset

dataset = Dataset.from_dict(train_data)

def preprocess(example):
    inputs = tokenizer(
        example["question"],
        example["context"],
        truncation=True,
        padding="max_length",
        max_length=128
    )

    start = example["answers"]["answer_start"][0]
    end = start + len(example["answers"]["text"][0])

    inputs["start_positions"] = inputs.char_to_token(1, start)
    inputs["end_positions"] = inputs.char_to_token(1, end - 1)

    return inputs

tokenized_dataset = dataset.map(preprocess)

Fine Tuning

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./qa_model",
    per_device_train_batch_size=2,
    num_train_epochs=2,
    logging_steps=10
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset
)

trainer.train()

Inference

import torch

tokenizer = AutoTokenizer.from_pretrained("./qa_model")
model = AutoModelForQuestionAnswering.from_pretrained("./qa_model")

question = "Who introduced Transformers?"
context = "Transformers were introduced in 2017 by Vaswani et al."

inputs = tokenizer(question, context, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

start = torch.argmax(outputs.start_logits)
end = torch.argmax(outputs.end_logits)

answer_ids = inputs["input_ids"][0][start:end+1]
answer = tokenizer.decode(answer_ids)

print(answer)

### Output
Vaswani et al.

4. AutoModelForCausalLM

Fine-Tuning & Inference Explained (Text Generation)

AutoModelForCausalLM is the class you use when you want a model to generate text.

If AutoModelForQuestionAnswering:

finds answers in text

Then AutoModelForCausalLM:

writes new text

This is the backbone behind chatbots, assistants, code generators, and story writers.

AutoModelForCausalLM is a pretrained autoregressive language model that predicts the next token given previous tokens

The model learns:

“Given everything so far, what word comes next?”

Use AutoModelForCausalLM if you want:

Free-form text generation
Conversational chatbots
Code completion
Creative writing
Instruction-following systems

Input prompt

User: Explain transformers in simple words.
Assistant:Transformers are models that understand text by paying attention to words...

How Fine-Tuning Works (High Level)

Load pretrained model
Prepare dataset
Tokenize
Trainer → Train → Save

👉 Refer to earlier fine-tuning steps.

Example Training Data

Prompt: What is NLP?
Answer: NLP is a field of AI that focuses on language.

### Load Model
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "gpt2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(model_name)

### Example of Training Data
from datasets import Dataset

data = {
    "text": [
        "User: What is AI?nAssistant: AI is the simulation of human intelligence.",
        "User: What is NLP?nAssistant: NLP helps machines understand language."
    ]
}

dataset = Dataset.from_dict(data)

### Tokenize
def tokenize(example):
    tokens = tokenizer(
        example["text"],
        truncation=True,
        padding="max_length",
        max_length=128
    )
    tokens["labels"] = tokens["input_ids"].copy()
    return tokens

tokenized_dataset = dataset.map(tokenize)
### fine Tuning
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./causal_lm",
    per_device_train_batch_size=2,
    num_train_epochs=2,
    logging_steps=10
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset
)

trainer.train()

### Generate Text
prompt = "User: Explain machine learningnAssistant:"

inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=50,
        do_sample=True,
        temperature=0.7
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

5. AutoModelForSeq2SeqLM

AutoModelForSeq2SeqLM is used when both your input and output are text, but the output is different from the input.

If AutoModelForCausalLM:

continues text

Then AutoModelForSeq2SeqLM:

transforms text

AutoModelForSeq2SeqLM It is a pretrained encoder–decoder model designed for text-to-text generation.

Popular Models

T5 — text-to-text framework
BART — summarization & generation
mBART — multilingual translation
Pegasus — abstractive summarization

Core Idea

“Read the input text → understand it → generate a new text.

The model:

Encodes the input text
Decodes a new output sequence using attention

Use AutoModelForSeq2SeqLM if your task is:

Summarization
Translation
Paraphrasing
Question generation
Grammar correction

Use Case

 ### Input
A long news article...
### output
A short summary of the article.

Example of a Training data

Input:  "Summarize: Transformers revolutionized NLP..."
Target: "Transformers changed NLP."

Fine tuning pipeline

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "facebook/bart-large-cnn"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

### example dataset
from datasets import Dataset

data = {
    "input_text": [
        "Transformers revolutionized NLP by enabling attention mechanisms..."
    ],
    "target_text": [
        "Transformers changed NLP using attention."
    ]
}

dataset = Dataset.from_dict(data)

### Tokenizer
def preprocess(example):
    model_inputs = tokenizer(
        example["input_text"],
        truncation=True,
        padding="max_length",
        max_length=128
    )

    with tokenizer.as_target_tokenizer():
        labels = tokenizer(
            example["target_text"],
            truncation=True,
            padding="max_length",
            max_length=64
        )

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

tokenized_dataset = dataset.map(preprocess)

### Fine tuning
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./seq2seq_model",
    per_device_train_batch_size=2,
    num_train_epochs=2,
    logging_steps=10
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset
)

trainer.train()

### inference
import torch

tokenizer = AutoTokenizer.from_pretrained("./seq2seq_model")
model = AutoModelForSeq2SeqLM.from_pretrained("./seq2seq_model")
### Generate Text
text = "Transformers revolutionized NLP by enabling attention mechanisms."

inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=40
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

NLP AutoModels Comparison Table

| AutoModel Class                        | Input              | Output         | What It Does                    | Typical Use Cases                       |
| -------------------------------------- | ------------------ | -------------- | ------------------------------- | --------------------------------------- |
| **AutoModel**                          | Text               | Embeddings     | Returns hidden states only      | Semantic search, similarity, clustering |
| **AutoModelForSequenceClassification** | Text               | Label(s)       | One label per sentence          | Sentiment, spam, intent detection       |
| **AutoModelForTokenClassification**    | Text               | Token labels   | One label per word              | NER, POS tagging, entity extraction     |
| **AutoModelForQuestionAnswering**      | Question + Context | Text span      | Extracts answer from context    | Document QA, search, legal analysis     |
| **AutoModelForMaskedLM**               | Text with `[MASK]` | Missing token  | Predicts masked words           | Pretraining, fill-in-the-blank          |
| **AutoModelForCausalLM**               | Prompt             | Generated text | Continues text autoregressively | Chatbots, code gen, writing             |
| **AutoModelForSeq2SeqLM**              | Text               | New text       | Transforms text                 | Summarization, translation              |

Real world use case mapping

| Real Problem                           | Correct AutoModel                  |
| -------------------------------------- | ---------------------------------- |
| “Is this review positive or negative?” | AutoModelForSequenceClassification |
| “Extract names & locations from text”  | AutoModelForTokenClassification    |
| “Answer questions from a PDF”          | AutoModelForQuestionAnswering      |
| “Build a chatbot”                      | AutoModelForCausalLM               |
| “Summarize this article”               | AutoModelForSeq2SeqLM              |
| “Find similar documents”               | AutoModel                          |

Why Most Users Don’t Need It

For most applications:

Fine-tuning uses task-specific models
Pretraining heads are unnecessary
Training is expensive

AutoConfig:

AutoConfig loads only the model configuration, not weights.

What’s Inside a Config?

Number of layers
Hidden size
Attention heads
Dropout
Vocabulary size
Architecture type

It is used to inspect the model or Create Models from ScratchUseful for research and experimentation.

config = AutoConfig.from_pretrained("bert-base-uncased")
print(config.hidden_size)

### Modify the Architecture
config.num_hidden_layers = 6
model = AutoModel.from_config(config)

You might have seen and wondering what is TfAutoModel its nothing but model trained specific to tensor flow library it returns tensorflow ‘tf’ instead of pytorch ‘pt’

What Are TfAutoModel Classes?

What Does “Tf” Mean?

Tf = TensorFlow

Hugging Face supports both:

PyTorch (AutoModel)

from transformers import AutoModel
model = AutoModel.from_pretrained("bert-base-uncased")

TensorFlow / Keras (TFAutoModel)

from transformers import TFAutoModel
model = TFAutoModel.from_pretrained("bert-base-uncased")

so every PyTorch AutoModel has a TensorFlow equivalent:

| PyTorch                            | TensorFlow                           |
| ---------------------------------- | ------------------------------------ |
| AutoModel                          | TFAutoModel                          |
| AutoModelForSequenceClassification | TFAutoModelForSequenceClassification |
| AutoModelForTokenClassification    | TFAutoModelForTokenClassification    |
| AutoModelForQuestionAnswering      | TFAutoModelForQuestionAnswering      |
| AutoModelForCausalLM               | TFAutoModelForCausalLM               |
| AutoModelForSeq2SeqLM              | TFAutoModelForSeq2SeqLM              |

We will continue Vision and Audio AutoModels details and fine tuning in next article

Guide to Hugging Face AutoModelFor** Classes and Tokenizers was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Like 0

Liked Liked