Spectral-Temporal MoE: An RL-Driven Dual-Domain Transformer for Efficient Multi-Horizon Electrical Load Forecasting
Accurate long-term electrical load forecasting is required for stable smart grid operation, yet remains difficult due to multi-scale periodic patterns and non-stationary temporal shifts across different prediction horizons. This work presents MoE-Transformer, a reinforcement learning-driven dual-domain framework that integrates frequency-domain processing with sparse expert networks for adaptive forecasting. An Extended Discrete Fourier Transform (Extended DFT) is introduced to address spectral misalignment by aligning the input spectrum with the frequency grid of the full prediction window. The model employs parallel Mixture-of-Experts (MoE) modules in the time and frequency domains (T-MoE and F-MoE), where domain-specific experts capture complementary temporal and spectral structures. Expert selection is formulated as a dual Markov Decision Process and optimized through a reinforcement learning routing mechanism that balances prediction accuracy, routing stability, and expert utilization diversity. Experiments on five benchmark datasets, including ETTh1, Electricity, and Traffic, across four forecasting horizons show that MoE-Transformer consistently outperforms state-of-the-art baselines, reducing Mean Squared Error (MSE) by 50.9–56.9%. Sparse expert activation lowers memory usage by 40% and reduces inference latency by 60%, supporting deployment in real-time forecasting settings. Ablation results further quantify the contributions of Extended DFT, dual-domain modeling, and reinforcement-driven routing, yielding performance gains of 5.8%, 4.6%, and up to 47.2%, respectively.