From Grid Search to Modern Deterministic Ideas: A Brief Review of Hyper-Parameter Optimisation in Deep and Foundation Model Era
As deep and machine learning systems grow in scale and are deployed in sensitive, high-stakes environments, the need for reliable and fully reproducible model tuning has never been greater. This paper presents a clear and structured overview of Deterministic Hyper-parameter Optimisation (DHPO), covering four key families of methods: Direct Search, Surrogate-based, Hyper-gradient, and Multi-Fidelity approaches. We frame DHPO within a reproducible bilevel optimisation setting and discuss how each method performs in terms of efficiency, scalability, and practical applicability. Our analysis shows that Hyper-gradient and Multi-Fidelity techniques generally provide the best balance of speed and scalability for deep learning, while Surrogate-based methods are strong options when compute resources are limited. Direct Search remains appealing for its simplicity and guaranteed repeatability, but faces challenges in high-dimensional or expensive training scenarios. Using visual comparisons—including heatmaps, radar profiles, and cost–performance plots—we show that no single DHPO approach is universally superior, and that method selection should depend on task constraints and reproducibility needs. Reported results in the literature indicate that hyper-gradient and multi-fidelity DHPO methods can reduce training costs by 40–70% in deep-learning settings while achieving near-baseline performance with an accuracy deviation of 1–3%. Finally, we conclude by outlining the key gaps and future research opportunities, including scalable DHPO for foundation models, hybrid deterministic–stochastic designs, differentiable architecture and data optimisation, and the path toward fully deterministic AutoML.