Geometric Patterns of Meaning: A PHATE Manifold Analysis of Multi-lingual Embeddings
arXiv:2601.09731v1 Announce Type: new
Abstract: We introduce a multi-level analysis framework for examining semantic geometry in multilingual embeddings, implemented through Semanscope (a visualization tool that applies PHATE manifold learning across four linguistic levels). Analysis of diverse datasets spanning sub-character components, alphabetic systems, semantic domains, and numerical concepts reveals systematic geometric patterns and critical limitations in current embedding models. At the sub-character level, purely structural elements (Chinese radicals) exhibit geometric collapse, highlighting model failures to distinguish semantic from structural components. At the character level, different writing systems show distinct geometric signatures. At the word level, content words form clustering-branching patterns across 20 semantic domains in English, Chinese, and German. Arabic numbers organize through spiral trajectories rather than clustering, violating standard distributional semantics assumptions. These findings establish PHATE manifold learning as an essential analytic tool not only for studying geometric structure of meaning in embedding space, but also for validating the effectiveness of embedding models in capturing semantic relationships.