Subspace Geometry Governs Catastrophic Forgetting in Low-Rank Adaptation

arXiv:2603.02224v1 Announce Type: new
Abstract: Low-Rank Adaptation (LoRA) has emerged as a parameter-efficient approach for adapting large pre-trained models, yet its behavior under continual learning remains poorly understood. We present a geometric theory characterizing catastrophic forgetting in LoRA through the lens of gradient subspace interactions. Our central finding is that forgetting is governed by a simple geometric law: $mathcal{F} = alpha(1 – cos^2theta_{min}) + beta$, where $theta_{min}$ is the minimum principal angle between task gradient subspaces. This formulation reveals an approximate rank-invariance property, at high subspace angles, forgetting becomes largely independent of the adapter rank (coefficient of variation $approx 0.8%$ in controlled synthetic settings; CV $approx 10$-$19%$ on real benchmarks, suggesting this is regime-dependent rather than absolute). We validate our theory on synthetic tasks ($r=0.994$ correlation), Split-CIFAR100 with ViT-LoRA, and sequential GLUE with RoBERTa-LoRA. Our analysis reconciles seemingly contradictory findings in the literature: we show that rank affects forgetting only when task subspaces are similar (low angle), while orthogonal methods like O-LoRA provide minimal benefit when natural orthogonality is already high. These insights provide principled guidance for continual learning with parameter-efficient fine-tuning.

Liked Liked