A Geometrically-Grounded Drive for MDL-Based Optimization in Deep Learning

arXiv:2603.12304v1 Announce Type: new
Abstract: This paper introduces a novel optimization framework that fundamentally integrates the Minimum Description Length (MDL) principle into the training dynamics of deep neural networks. Moving beyond its conventional role as a model selection criterion, we reformulate MDL as an active, adaptive driving force within the optimization process itself. The core of our method is a geometrically-grounded cognitive manifold whose evolution is governed by a textit{coupled Ricci flow}, enriched with a novel textit{MDL Drive} term derived from first principles. This drive, modulated by the task-loss gradient, creates a seamless harmony between data fidelity and model simplification, actively compressing the internal representation during training. We establish a comprehensive theoretical foundation, proving key properties including the monotonic decrease of description length (Theorem~ref{thm:convergence}), a finite number of topological phase transitions via a geometric surgery protocol (Theorems~ref{thm:surgery}, ref{thm:ultimate_fate}), and the emergence of universal critical behavior (Theorem~ref{thm:universality}). Furthermore, we provide a practical, computationally efficient algorithm with $O(N log N)$ per-iteration complexity (Theorem~ref{thm:complexity}), alongside guarantees for numerical stability (Theorem~ref{thm:stability}) and exponential convergence under convexity assumptions (Theorem~ref{thm:convergence_rate}). Empirical validation on synthetic regression and classification tasks confirms the theoretical predictions, demonstrating the algorithm’s efficacy in achieving robust generalization and autonomous model simplification. This work provides a principled path toward more autonomous, generalizable, and interpretable AI systems by unifying geometric deep learning with information-theoretic principles.

Liked Liked