Morphological Addressing of Identity Basins in Text-to-Image Diffusion Models

arXiv:2602.18533v1 Announce Type: new
Abstract: We demonstrate that morphological pressure creates navigable gradients at multiple levels of the text-to-image generative pipeline. In Study~1, identity basins in Stable Diffusion 1.5 can be navigated using morphological descriptors — constituent features like platinum blonde,” beauty mark,” and 1950s glamour” — without the target’s name or photographs. A self-distillation loop (generating synthetic images from descriptor prompts, then training a LoRA on those outputs) achieves consistent convergence toward a specific identity as measured by ArcFace similarity. The trained LoRA creates a local coordinate system shaping not only the target identity but also its inverse: maximal away-conditioning produces eldritch” structural breakdown in base SD1.5, while the LoRA-equipped model produces “uncanny valley” outputs — coherent but precisely wrong. In Study~2, we extend this to prompt-level morphology. Drawing on phonestheme theory, we generate 200 novel nonsense words from English sound-symbolic clusters (e.g., emph{cr-}, emph{sn-}, emph{-oid}, emph{-ax}) and find that phonestheme-bearing candidates produce significantly more visually coherent outputs than random controls (mean Purity@1 = 0.371 vs. 0.209, p<0.00001p < 0.00001 p<0.00001, Cohen’s d=0.55d = 0.55 d=0.55). Three candidates — emph{snudgeoid}, emph{crashax}, and emph{broomix} — achieve perfect visual consistency (Purity@1 = 1.0) with zero training data contamination, each generating a distinct, coherent visual identity from phonesthetic structure alone. Together, these studies establish that morphological structure — whether in feature descriptors or prompt-level phonological form — creates systematic navigational gradients through diffusion model latent spaces. We document phase transitions in identity basins, CFG-invariant identity stability, and novel visual concepts emerging from sub-lexical sound patterns.

Liked Liked