Efficient and Minimax-optimal In-context Nonparametric Regression with Transformers
arXiv:2601.15014v1 Announce Type: new Abstract: We study in-context learning for nonparametric regression with $alpha$-H”older smooth regression functions, for some $alpha>0$. We prove that, with $n$ in-context examples and $d$-dimensional regression covariates, a pretrained transformer with $Theta(log n)$ parameters and $Omegabigl(n^{2alpha/(2alpha+d)}log^3 nbigr)$ pretraining sequences can achieve the minimax-optimal rate of convergence $Obigl(n^{-2alpha/(2alpha+d)}bigr)$ in mean squared error. Our result requires substantially fewer transformer parameters and pretraining sequences than previous results in the literature. This is achieved by showing that transformers […]