iMaterialist quality issues

digitado ⋅ 12 de January de 2026

We’re releasing FASHN Human Parser, a SegFormer-B4 fine-tuned for human parsing in fashion contexts.

Background: Dataset quality issues

Before training our own model, we spent time analyzing the commonly used datasets for human parsing: ATR, LIP, and iMaterialist. We found consistent quality issues that affect models trained on them:

ATR:

Annotation “holes” where background pixels appear inside labeled regions
Label spillage where annotations extend beyond object boundaries

LIP:

Same issues as ATR (same research group)
Inconsistent labeling between left/right body parts and clothing
Aggressive crops from multi-person images causing artifacts
Ethical concerns (significant portion includes minors)

iMaterialist:

Higher quality images and annotations overall
Multi-person images where only one person is labeled (~6% of dataset)
No body part labels (clothing only)

We documented these findings in detail: Fashion Segmentation Datasets and Their Common Problems

What we did

We curated our own dataset addressing these issues and fine-tuned a SegFormer-B4. The model outputs 18 semantic classes relevant for fashion applications:

Body parts: face, hair, arms, hands, legs, feet, torso
Clothing: top, dress, skirt, pants, belt, scarf
Accessories: bag, hat, glasses, jewelry
Background

Technical details

Spec	Value
Architecture	SegFormer-B4 (MIT-B4 encoder + MLP decoder)
Input size	384 x 576
Output	Segmentation mask at input resolution
Model size	~244MB
Inference	~300ms GPU, 2-3s CPU

The PyPI package uses cv2.INTER_AREA for preprocessing (matching training), while the HuggingFace pipeline uses PIL LANCZOS for broader compatibility.

Limitations

Optimized for fashion/e-commerce images (single person, relatively clean backgrounds)
Performance may degrade on crowded scenes or unusual poses
18-class schema is fashion-focused; may not suit all human parsing use cases

Happy to discuss the dataset curation process, architecture choices, or answer any questions.

submitted by /u/JYP_Scouter
[link] [comments]

Like 0

Liked Liked

[P] Open-sourcing a human parsing model trained on curated data to address ATR/LIP/iMaterialist quality issues

Background: Dataset quality issues

What we did

Technical details

Links

Limitations