A Linear Approach to Data Poisoning

arXiv:2505.15175v3 Announce Type: replace
Abstract: Backdoor and data-poisoning attacks can flip predictions with tiny training corruptions, yet a sharp theory linking poisoning strength, overparameterization, and regularization is lacking. We analyze ridge least squares with an unpenalized intercept in the high-dimensional regime (p,ntoinfty), (p/nto c). Targeted poisoning is modelled by shifting a (theta)-fraction of one class by a direction (mathbf{v}) and relabelling. Using resolvent techniques and deterministic equivalents from random matrix theory, we derive closed-form limits for the poisoned score explicit in the model parameters. The formulas yield scaling laws, recover the interpolation threshold as (cto1) in the ridgeless limit, and show that the weights align with the poisoning direction. Synthetic experiments match theory across sweeps of the parameters and MNIST backdoor tests show qualitatively consistent trends. The results provide a tractable framework for quantifying poisoning in linear models.

Liked Liked