A Comprehensive Implementation Guide to ModelScope for Model Search, Inference, Fine-Tuning, Evaluation, and Export

digitado ⋅ 9 de April de 2026

In this tutorial, we explore ModelScope through a practical, end-to-end workflow that runs smoothly on Colab. We begin by setting up the environment, verifying dependencies, and confirming GPU availability so we can work with the framework reliably from the start. From there, we interact with the ModelScope Hub to search for models, download snapshots, load datasets, and understand how its ecosystem connects with familiar tools such as Hugging Face Transformers. As we move forward, we apply pretrained pipelines across NLP and computer vision tasks, then fine-tune a sentiment classifier on IMDB, evaluate its performance, and export it for deployment. Through this process, we build not only a working implementation but also a clear understanding of how ModelScope can support research, experimentation, and production-oriented AI workflows.

Copy CodeCopiedUse a different Browser

!pip install -q addict simplejson yapf gast oss2 sortedcontainers requests
!pip install -q modelscope transformers>=4.37.0 datasets torch torchvision 
   accelerate scikit-learn sentencepiece Pillow matplotlib evaluate optimum[exporters]


import torch, os, sys, json, warnings, numpy as np
warnings.filterwarnings("ignore")


import addict; print(" addict OK")


print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
   print(f"GPU: {torch.cuda.get_device_name(0)}")


import modelscope
print(f"ModelScope: {modelscope.__version__}")


DEVICE = 0 if torch.cuda.is_available() else -1




from modelscope import snapshot_download
from modelscope.hub.api import HubApi


api = HubApi()
print("n Searching ModelScope Hub for 'bert' models...n")
try:
   models = api.list_models(filter_dict={"Search": "bert"}, sort="StarCount")
   for i, m in enumerate(models):
       if i >= 5:
           break
       print(f"  • {m.get('Name', m.get('id', 'N/A'))}")
except Exception as e:
   print(f"  (Hub search may be unavailable outside China — {e})")


model_dir = snapshot_download(
   "AI-ModelScope/bert-base-uncased",
   cache_dir="./ms_cache",
)
print(f"n Model downloaded to: {model_dir}")
print("   Files:", os.listdir(model_dir)[:8])




from modelscope.msdatasets import MsDataset


print("n Loading 'imdb' dataset...n")
try:
   ds = MsDataset.load("imdb", split="train")
   print(f"  Dataset size: {len(ds)} samples")
   sample = next(iter(ds))
   print(f"  Keys: {list(sample.keys())}")
   print(f"  Text preview: {sample['text'][:120]}...")
   print(f"  Label: {sample['label']} (0=neg, 1=pos)")
except Exception as e:
   print(f"  Falling back to HuggingFace datasets: {e}")
   from datasets import load_dataset
   ds = load_dataset("imdb", split="train")
   print(f"  Dataset size: {len(ds)} samples")


labels = [row["label"] for row in ds]
print("n  Label distribution:")
for label in sorted(set(labels)):
   count = labels.count(label)
   print(f"    Label {label}: {count} ({count/len(labels)*100:.1f}%)")

We set up the complete Colab environment and install all the libraries required for the tutorial. We verify important dependencies such as addict, check the PyTorch and CUDA setup, and confirm that ModelScope is installed correctly before moving forward. We then begin working with the ModelScope ecosystem by searching the hub for BERT models, downloading a model snapshot locally, loading the IMDB dataset, and examining its label distribution to understand the data we will use later.

Copy CodeCopiedUse a different Browser

from transformers import pipeline as hf_pipeline


print("n NLP PIPELINESn")


print("── 4a. Sentiment Analysis ──")
sentiment = hf_pipeline(
   "sentiment-analysis",
   model="distilbert-base-uncased-finetuned-sst-2-english",
   device=DEVICE,
)


test_texts = [
   "ModelScope makes AI model access incredibly easy and intuitive!",
   "The documentation was confusing and the API kept returning errors.",
   "The weather today is partly cloudy with a slight breeze.",
]


for text in test_texts:
   result = sentiment(text)[0]
   emoji = "" if result["label"] == "POSITIVE" else ""
   print(f'  {emoji} {result["label"]} ({result["score"]:.4f}): "{text[:60]}..."')




print("n── 4b. Named Entity Recognition ──")
ner = hf_pipeline(
   "ner",
   model="dbmdz/bert-large-cased-finetuned-conll03-english",
   aggregation_strategy="simple",
   device=DEVICE,
)


ner_text = "Alibaba's ModelScope platform was developed in Hangzhou, China and competes with Hugging Face."
entities = ner(ner_text)
for ent in entities:
   print(f'    {ent["word"]} → {ent["entity_group"]} (score: {ent["score"]:.3f})')




print("n── 4c. Zero-Shot Classification ──")
zsc = hf_pipeline(
   "zero-shot-classification",
   model="facebook/bart-large-mnli",
   device=DEVICE,
)


zsc_result = zsc(
   "ModelScope provides pretrained models for NLP, CV, and audio tasks.",
   candidate_labels=["technology", "sports", "politics", "science"],
)
for label, score in zip(zsc_result["labels"], zsc_result["scores"]):
   bar = "█" * int(score * 30)
   print(f"  {label:<12} {score:.3f} {bar}")




print("n── 4d. Text Generation (GPT-2) ──")
generator = hf_pipeline(
   "text-generation",
   model="gpt2",
   device=DEVICE,
)


gen_output = generator(
   "The future of open-source AI is",
   max_new_tokens=60,
   do_sample=True,
   temperature=0.8,
   top_p=0.9,
   num_return_sequences=1,
)
print(f"   {gen_output[0]['generated_text']}")




print("n── 4e. Fill-Mask (BERT) ──")
fill_mask = hf_pipeline(
   "fill-mask",
   model=model_dir,
   device=DEVICE,
)


mask_results = fill_mask("ModelScope is an open-source [MASK] for AI models.")
for r in mask_results[:5]:
   print(f"    [MASK] → '{r['token_str']}' (score: {r['score']:.4f})")

We focus on natural language processing pipelines and explore how easily we can run multiple tasks with pretrained models. We perform sentiment analysis, named entity recognition, zero-shot classification, text generation, and fill-mask prediction, providing a broad view of ModelScope-compatible inference workflows. As we test these tasks on sample inputs, we see how quickly we can move from raw text to meaningful model outputs in a unified pipeline.

Copy CodeCopiedUse a different Browser

print("n  COMPUTER VISION PIPELINESn")


print("── 5a. Image Classification (ViT) ──")
img_classifier = hf_pipeline(
   "image-classification",
   model="google/vit-base-patch16-224",
   device=DEVICE,
)


img_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
img_results = img_classifier(img_url)


for r in img_results[:5]:
   print(f"    {r['label']:<30} ({r['score']:.4f})")




print("n── 5b. Object Detection (DETR) ──")
detector = hf_pipeline(
   "object-detection",
   model="facebook/detr-resnet-50",
   device=DEVICE,
)


detections = detector(img_url)
for d in detections[:5]:
   box = d["box"]
   print(f"   {d['label']:<15} score={d['score']:.3f}  box=({box['xmin']:.0f},{box['ymin']:.0f},{box['xmax']:.0f},{box['ymax']:.0f})")




print("n── 5c. Visualising Detections ──")
from PIL import Image, ImageDraw
import requests, matplotlib.pyplot as plt
from io import BytesIO


img = Image.open(BytesIO(requests.get(img_url).content))
draw = ImageDraw.Draw(img)
colors = ["#58a6ff", "#3fb950", "#d2a8ff", "#f78166", "#ff7b72"]


for i, d in enumerate(detections[:5]):
   box = d["box"]
   color = colors[i % len(colors)]
   draw.rectangle([box["xmin"], box["ymin"], box["xmax"], box["ymax"]], outline=color, width=3)
   draw.text((box["xmin"]+4, box["ymin"]+2), f"{d['label']} {d['score']:.2f}", fill=color)


plt.figure(figsize=(10, 7))
plt.imshow(img)
plt.axis("off")
plt.title("DETR Object Detection")
plt.tight_layout()
plt.savefig("detection_result.png", dpi=150, bbox_inches="tight")
plt.show()
print("   Saved detection_result.png")




print("n HUGGINGFACE INTEROPn")


from transformers import AutoTokenizer, AutoModelForSequenceClassification


print("── Approach A: snapshot_download (works for models on ModelScope Hub) ──")
print(f"  We already downloaded bert-base-uncased in Section 2: {model_dir}")


print("n── Approach B: Direct HF loading (works globally for any HF model) ──")


hf_model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(hf_model_name)
model = AutoModelForSequenceClassification.from_pretrained(hf_model_name)
model.eval()
print(f"   Loaded '{hf_model_name}' directly from HuggingFace")


print("n── Manual inference without pipeline ──")
texts = [
   "This open-source framework is a game changer for researchers!",
   "I encountered multiple bugs during installation.",
]


inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")


with torch.no_grad():
   outputs = model(**inputs)
   probs = torch.softmax(outputs.logits, dim=-1)


id2label = model.config.id2label
for text, prob in zip(texts, probs):
   pred_id = prob.argmax().item()
   print(f"  ✦ {id2label[pred_id]} ({prob[pred_id]:.4f}): '{text[:55]}...'")


print("n── Loading Section 2's ModelScope-downloaded BERT with Transformers ──")
ms_tokenizer = AutoTokenizer.from_pretrained(model_dir)
ms_model = AutoModelForSequenceClassification.from_pretrained(
   model_dir, num_labels=2, ignore_mismatched_sizes=True
)
print(f"   bert-base-uncased from ModelScope loaded into Transformers AutoModel")
print(f"     Vocab size: {ms_tokenizer.vocab_size}, Hidden: {ms_model.config.hidden_size}")
del ms_model

We shift from text to computer vision and run image classification and object detection on a sample image. We also visualize the detection results by drawing bounding boxes and labels, which helps us inspect the model’s predictions more intuitively and practically. After that, we explore Hugging Face interoperability by loading models and tokenizers directly, performing manual inference, and demonstrating that a model downloaded from ModelScope can also be used seamlessly with Transformers.

Copy CodeCopiedUse a different Browser

print("n FINE-TUNING (DistilBERT on IMDB subset)n")


from datasets import load_dataset
from transformers import (
   AutoTokenizer,
   AutoModelForSequenceClassification,
   TrainingArguments,
   Trainer,
   DataCollatorWithPadding,
)
import evaluate


print("  Loading IMDB subset...")
full_train = load_dataset("imdb", split="train").shuffle(seed=42)
full_test  = load_dataset("imdb", split="test").shuffle(seed=42)
train_ds = full_train.select(range(1000))
eval_ds  = full_test.select(range(500))
print(f"  Train: {len(train_ds)}, Eval: {len(eval_ds)}")


ckpt = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(ckpt)


def tokenize_fn(batch):
   return tokenizer(batch["text"], truncation=True, max_length=256)


train_ds = train_ds.map(tokenize_fn, batched=True)
eval_ds  = eval_ds.map(tokenize_fn, batched=True)


model = AutoModelForSequenceClassification.from_pretrained(
   ckpt,
   num_labels=2,
   id2label={0: "NEGATIVE", 1: "POSITIVE"},
   label2id={"NEGATIVE": 0, "POSITIVE": 1},
)


accuracy_metric = evaluate.load("accuracy")
f1_metric = evaluate.load("f1")


def compute_metrics(eval_pred):
   logits, labels = eval_pred
   preds = np.argmax(logits, axis=-1)
   acc = accuracy_metric.compute(predictions=preds, references=labels)
   f1 = f1_metric.compute(predictions=preds, references=labels, average="weighted")
   return {**acc, **f1}


training_args = TrainingArguments(
   output_dir="./ms_finetuned_model",
   num_train_epochs=2,
   per_device_train_batch_size=16,
   per_device_eval_batch_size=32,
   learning_rate=2e-5,
   weight_decay=0.01,
   eval_strategy="epoch",
   save_strategy="epoch",
   load_best_model_at_end=True,
   metric_for_best_model="accuracy",
   logging_steps=50,
   report_to="none",
   fp16=torch.cuda.is_available(),
   dataloader_num_workers=2,
)


trainer = Trainer(
   model=model,
   args=training_args,
   train_dataset=train_ds,
   eval_dataset=eval_ds,
   processing_class=tokenizer,
   data_collator=DataCollatorWithPadding(tokenizer),
   compute_metrics=compute_metrics,
)


print("   Starting training...n")
train_result = trainer.train()
print(f"n   Training complete!")
print(f"     Train loss: {train_result.training_loss:.4f}")
print(f"     Train time: {train_result.metrics['train_runtime']:.1f}s")

We move into fine-tuning by preparing a smaller IMDB subset so that training remains practical inside Google Colab. We tokenize the text, load a pretrained DistilBERT classification model, define evaluation metrics, and configure the training process with suitable arguments for a lightweight but realistic demonstration. We then launch training and observe how a pretrained checkpoint is adapted into a task-specific sentiment classifier through the Trainer workflow.

Copy CodeCopiedUse a different Browser

print("n MODEL EVALUATIONn")


eval_results = trainer.evaluate()
print("  Evaluation Results:")
for key, value in eval_results.items():
   if isinstance(value, float):
       print(f"    {key:<25}: {value:.4f}")


from sklearn.metrics import classification_report, confusion_matrix


preds_output = trainer.predict(eval_ds)
preds = np.argmax(preds_output.predictions, axis=-1)
labels = preds_output.label_ids


print("n  Classification Report:")
print(classification_report(labels, preds, target_names=["NEGATIVE", "POSITIVE"]))


cm = confusion_matrix(labels, preds)
fig, ax = plt.subplots(figsize=(5, 4))
im = ax.imshow(cm, cmap="Blues")
ax.set_xticks([0, 1]); ax.set_yticks([0, 1])
ax.set_xticklabels(["NEGATIVE", "POSITIVE"])
ax.set_yticklabels(["NEGATIVE", "POSITIVE"])
ax.set_xlabel("Predicted"); ax.set_ylabel("Actual")
ax.set_title("Confusion Matrix — Fine-Tuned DistilBERT")
for i in range(2):
   for j in range(2):
       ax.text(j, i, str(cm[i, j]), ha="center", va="center",
               color="white" if cm[i, j] > cm.max()/2 else "black", fontsize=18)
plt.colorbar(im)
plt.tight_layout()
plt.savefig("confusion_matrix.png", dpi=150)
plt.show()
print("   Saved confusion_matrix.png")


print("n── Testing Fine-Tuned Model on New Inputs ──")
ft_pipeline = hf_pipeline(
   "sentiment-analysis",
   model=trainer.model,
   tokenizer=tokenizer,
   device=DEVICE,
)


new_reviews = [
   "An absolutely breathtaking masterpiece with brilliant performances!",
   "Waste of two hours. Terrible script and wooden acting.",
   "Decent popcorn movie but nothing special. Had some fun moments.",
]


for review in new_reviews:
   res = ft_pipeline(review)[0]
   emoji = "" if res["label"] == "POSITIVE" else ""
   print(f'  {emoji} {res["label"]} ({res["score"]:.4f}): "{review}"')




print("n EXPORTING THE FINE-TUNED MODELn")


save_path = "./ms_finetuned_model/final"
trainer.save_model(save_path)
tokenizer.save_pretrained(save_path)
print(f"   Model saved to: {save_path}")
print(f"     Files: {os.listdir(save_path)}")


print("n── ONNX Export ──")
try:
   from optimum.exporters.onnx import main_export
   onnx_path = "./ms_finetuned_model/onnx"
   main_export(save_path, output=onnx_path, task="text-classification")
   print(f"   ONNX model exported to: {onnx_path}")
   print(f"     Files: {os.listdir(onnx_path)}")
except Exception as e:
   print(f"    ONNX export skipped: {e}")


print("""
── Upload to ModelScope Hub (manual step) ──


 1. Get a token from https://modelscope.cn/my/myaccesstoken
 2. Run:


    from modelscope.hub.api import HubApi
    api = HubApi()
    api.login('YOUR_TOKEN')
    api.push_model(
        model_id='your-username/my-finetuned-distilbert',
        model_dir='./ms_finetuned_model/final',
    )
""")


print("""
╔══════════════════════════════════════════════════════════════════╗
║                     TUTORIAL COMPLETE!                      ║
╠══════════════════════════════════════════════════════════════════╣
║  ✓ ModelScope Hub — search, browse & download models            ║
║  ✓ MsDataset — load datasets from the ModelScope ecosystem      ║
║  ✓ NLP pipelines — sentiment, NER, zero-shot, generation, mask  ║
║  ✓ CV pipelines — image classification, object detection, viz   ║
║  ✓ HuggingFace interop — snapshot_download + Transformers       ║
║  ✓ Fine-tuning — DistilBERT on IMDB with Trainer API            ║
║  ✓ Evaluation — accuracy, F1, confusion matrix                  ║
║  ✓ Export — local save, ONNX, Hub upload                        ║
╚══════════════════════════════════════════════════════════════════╝
""")

We evaluate the fine-tuned model in detail and inspect its performance using standard metrics, a classification report, and a confusion matrix. We also test the trained model on fresh review examples to see how it behaves on unseen inputs in a realistic inference setting. Also, we save the model locally, export it to ONNX when possible, and review how we can upload the final checkpoint to the ModelScope Hub for sharing and deployment.

In conclusion, we built a complete, hands-on pipeline that demonstrates how ModelScope fits into a real machine learning workflow rather than serving solely as a model repository. We searched and downloaded models, loaded datasets, ran inference across NLP and vision tasks, connected ModelScope assets with Transformers, fine-tuned a text classifier, evaluated it with meaningful metrics, and exported it for later use. By going through each stage of the code, we saw how the framework supports both experimentation and practical deployment, while also providing flexibility through interoperability with the broader Hugging Face ecosystem. In the end, we came away with a reusable Colab-ready workflow and a much stronger understanding of how to use ModelScope as a serious toolkit for building, testing, and sharing AI systems.

Check out the Full Codes here. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

The post A Comprehensive Implementation Guide to ModelScope for Model Search, Inference, Fine-Tuning, Evaluation, and Export appeared first on MarkTechPost.

Like 0

Liked Liked