Mitigating Gradient Inversion Risks in Language Models via Token Obfuscation
arXiv:2602.15897v1 Announce Type: new Abstract: Training and fine-tuning large-scale language models largely benefit from collaborative learning, but the approach has been proven vulnerable to gradient inversion attacks (GIAs), which allow adversaries to reconstruct private training data from shared gradients. Existing defenses mainly employ gradient perturbation techniques, e.g., noise injection or gradient pruning, to disrupt GIAs’ direct mapping from gradient space to token space. However, these methods often fall short due to the retention of semantics similarity across gradient, […]