Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs

arXiv:2602.20567v1 Announce Type: cross
Abstract: Push-Sum-based decentralized learning enables optimization over directed communication networks, where information exchange may be asymmetric. While convergence properties of such methods are well understood, their finite-iteration stability and generalization behavior remain unclear due to structural bias induced by column-stochastic mixing and asymmetric error propagation. In this work, we develop a unified uniform-stability framework for the Stochastic Gradient Push (SGP) algorithm that captures the effect of directed topology. A key technical ingredient is an imbalance-aware consistency bound for Push-Sum, which controls consensus deviation through two quantities: the stationary distribution imbalance parameter $delta$ and the spectral gap $(1-lambda)$ governing mixing speed. This decomposition enables us to disentangle statistical effects from topology-induced bias. We establish finite-iteration stability and optimization guarantees for both convex objectives and non-convex objectives satisfying the Polyak–L{}ojasiewicz condition. For convex problems, SGP attains excess generalization error of order $tilde{mathcal{O}}!left(frac{1}{sqrt{mn}}+frac{gamma}{delta(1-lambda)}+gammaright)$ under step-size schedules, and we characterize the corresponding optimal early stopping time that minimizes this bound. For PL{} objectives, we obtain convex-like optimization and generalization rates with dominant dependence proportional to $kappa!left(1+frac{1}{delta(1-lambda)}right)$, revealing a multiplicative coupling between problem conditioning and directed communication topology. Our analysis clarifies when Push-Sum correction is necessary compared with standard decentralized SGD and quantifies how imbalance and mixing jointly shape the best attainable learning performance.

Liked Liked