Gaussian Process Bandit Optimization with Machine Learning Predictions and Application to Hypothesis Generation
arXiv:2601.22315v1 Announce Type: new
Abstract: Many real-world optimization problems involve an expensive ground-truth oracle (e.g., human evaluation, physical experiments) and a cheap, low-fidelity prediction oracle (e.g., machine learning models, simulations). Meanwhile, abundant offline data (e.g., past experiments and predictions) are often available and can be used to pretrain powerful predictive models, as well as to provide an informative prior. We propose Prediction-Augmented Gaussian Process Upper Confidence Bound (PA-GP-UCB), a novel Bayesian optimization algorithm that leverages both oracles and offline data to achieve provable gains in sample efficiency for the ground-truth oracle queries. PA-GP-UCB employs a control-variates estimator derived from a joint Gaussian process posterior to correct prediction bias and reduce uncertainty. We prove that PA-GP-UCB preserves the standard regret rate of GP-UCB while achieving a strictly smaller leading constant that is explicitly controlled by prediction quality and offline data coverage. Empirically, PA-GP-UCB converges faster than Vanilla GP-UCB and naive prediction-augmented GP-UCB baselines on synthetic benchmarks and on a real-world hypothesis evaluation task grounded in human behavioral data, where predictions are provided by large language models. These results establish PA-GP-UCB as a general and sample-efficient framework for hypothesis generation under expensive feedback.