[P] Micro Diffusion — Discrete text diffusion in ~150 lines of pure Python

Inspired by Karpathy’s MicroGPT, I wanted to build the equivalent for text diffusion — a minimal implementation that shows the core algorithm without the complexity.

Autoregressive models generate left to right. Diffusion generates all tokens at once by iteratively unmasking from noise:

_ _ _ _ _ _ → _ o r _ a → n o r i a

Three implementations included:

– train_minimal.py (143 lines, pure NumPy) — bare minimum

– train_pure.py (292 lines, pure NumPy) — with comments and visualization

– train .py (413 lines, PyTorch) — bidirectional Transformer denoiser

All three share the same diffusion loop. Only the denoiser differs — because the denoiser is a pluggable component.

Trains on 32K SSA names, runs on CPU in a few minutes. No GPU needed.

GitHub: https://github.com/Siwoo4985/Micro-Diffusion

(I am not good at English, so I would like to inform you that I wrote this with the help of AI.)

submitted by /u/Impossible-Pay-4885
[link] [comments]

Liked Liked