A Purely Distributional Embedding Algorithm

digitado ⋅ 9 de February de 2026

This paper introduces the Distributional Embedding Algorithm (DEA), a purely deterministic framework for generating word embeddings through an emph{Iterative Structural Saliency Extraction}, which is based on a natural Galois correspondence. Unlike stochastic “black-box” machine learning models, DEA grounds semantic representation in the topological structure of a corpus, mapping the redistribution of semantic mass across identifiable structural nuclei. We apply this model to a controlled dataset of 300 propositions from David Bohm’s textit{Wholeness and the Implicate Order}, identifying four primary semantic basins that account for 74% of the text’s logical flow. By tracking the iterative expansion of these clusters, we demonstrate a “topological collapse” where shared lexical pivots connect distant propositions. Validation via cosine distance measures confirms high structural orthogonality between core conceptual terms and extrinsic category noise (e.g., textit{intelligence} vs. textit{desk}, $d=0.99$). We conclude that DEA offers a computationally efficient, transparent, and structurally-aware alternative that can be integrated with existing neural architectures to enhance interpretability in semantic modeling. Moreover, DEA is based on the textbf{Logarithmic Hypothesis} about the dimension of the embedding vectors, w.r.t. the number of propositions of the corpus. While modern AI architectures require thousands of embedding components to process $10^{13}$ propositions, the DEA approach suggests a structural collapse of complexity, where the global semantic manifold can be distilled into $L approx log_{10}(13)$ features. Even at a hyper-refined resolution of $Lapprox 30$, the model offers a deterministic, “white-box” alternative to current neural networks, providing a thousand-fold increase in computational efficiency without sacrificing logical precision.

Like 0

Liked Liked