The Unseen Threat: Residual Knowledge in Machine Unlearning under Perturbed Samples
Machine unlearning offers a practical alternative to avoid full model re-training by approximately removing the influence of specific user data. While existing methods certify unlearning via statistical indistinguishability from re-trained models, these guarantees do not naturally extend to model outputs when inputs are adversarially perturbed. In particular, slight perturbations of forget samples may still be correctly recognized by the unlearned model – even when a re-trained model fails to do so – revealing a novel privacy risk: information […]