Functional Scaling Laws in Kernel Regression: Loss Dynamics and Learning Rate Schedules
arXiv:2509.19189v4 Announce Type: replace-cross Abstract: Scaling laws have emerged as a unifying lens for understanding and guiding the training of large language models (LLMs). However, existing studies predominantly focus on the final-step loss, leaving open whether the entire loss dynamics obey similar laws and, crucially, how the learning rate schedule (LRS) shapes them. We address these gaps in a controlled theoretical setting by analyzing stochastic gradient descent (SGD) on a power-law kernel regression model. The key insight is […]