Awakening Dormant Experts:Counterfactual Routing to Mitigate MoE Hallucinations
arXiv:2604.14246v1 Announce Type: new Abstract: Sparse Mixture-of-Experts (MoE) models have achieved remarkable scalability, yet they remain vulnerable to hallucinations, particularly when processing long-tail knowledge. We identify that this fragility stems from static Top-$k$ routing: routers tend to favor high-frequency patterns over rare factual associations. Consequently, “specialist experts” possessing critical long-tail knowledge are often assigned low gating scores and remain “dormant” — under-prioritized for specific tokens despite their proven causal importance on other inputs. To address this, we propose […]