Distributional Machine Unlearning via Selective Data Removal
arXiv:2507.15112v4 Announce Type: replace-cross Abstract: Machine learning systems increasingly face requirements to remove entire domains of information–such as toxic language or biases–rather than individual user data. This task presents a dilemma: full removal of the unwanted domain data is computationally expensive, while random partial removal is statistically inefficient. We find that a domain’s statistical influence is often concentrated in a small subset of its data samples, suggesting a path between ineffective partial removal and unnecessary complete removal. We […]