Implicit Bias of the JKO Scheme

arXiv:2511.14827v3 Announce Type: replace
Abstract: Wasserstein gradient flow provides a general framework for minimizing an energy functional $J$ over the space of probability measures on a Riemannian manifold $(M,g)$. Its canonical time-discretization, the Jordan-Kinderlehrer-Otto (JKO) scheme, produces for any step size $eta>0$ a sequence of probability distributions $rho_k^eta$ that approximate to first order in $eta$ Wasserstein gradient flow on $J$. But the JKO scheme also has many other remarkable properties not shared by other first order integrators, e.g. it preserves energy dissipation and exhibits unconditional stability for $lambda$-geodesically convex functionals $J$. To better understand the JKO scheme we characterize its implicit bias at second order in $eta$. We show that $rho_k^eta$ are approximated to order $eta^2$ by Wasserstein gradient flow on a modified energy [ J^{eta}(rho) = J(rho) – frac{eta}{4}int_M BiglVert nabla_g frac{delta J}{delta rho} (rho) BigrVert_{2}^{2} ,rho(dx), ] obtained by subtracting from $J$ the squared metric curvature of $J$ times $eta/4$. The JKO scheme therefore adds at second order in $eta$ a deceleration in directions where the metric curvature of $J$ is rapidly changing. This corresponds to canonical implicit biases for common functionals: for entropy the implicit bias is the Fisher information, for KL-divergence it is the Fisher-Hyv{“a}rinen divergence, and for Riemannian gradient descent it is the kinetic energy in the metric $g$. To understand the differences between minimizing $J$ and $J^eta$ we study JKO-Flow, Wasserstein gradient flow on $J^eta$, in several simple numerical examples. These include exactly solvable Langevin dynamics on the Bures-Wasserstein space and Langevin sampling from a quartic potential in 1D.

Liked Liked