[R] China just released first SOTA multimodal model trained entirely on domestic chips

Zhipu AI and Huawei just dropped GLM-Image, and the technical details are interesting.

First multimodal model trained completely on Chinese chips (Huawei Ascend 910) from data preprocessing to full scale training. They’re using a hybrid architecture combining autoregressive + diffusion decoder.

What stands out is the Chinese text rendering. It consistently ranks first among open source models for complex text generation, especially handling Chinese characters which most models struggle with.

Native support for 1024 to 2048 resolution at any aspect ratio without additional training. API pricing is 0.1 yuan per image (roughly $0.014).

The model handles both text to image and image to image generation in a single model. GitHub and Hugging Face repos are already up.

This is significant because it proves you can train frontier models without relying on Nvidia hardware. The compute efficiency numbers they’re claiming are 60% better than H200 for tokens per joule.

Whether those benchmarks hold up in practice remains to be seen but the fact they pulled this off on domestic hardware is noteworthy.

submitted by /u/Different_Case_6484
[link] [comments]

Liked Liked