In-Context Learning for Pure Exploration in Continuous Spaces

In active sequential testing, also termed pure exploration, a learner is tasked with the goal to adaptively acquire information so as to identify an unknown ground-truth hypothesis with as few queries as possible. This problem, originally studied by Chernoff in 1959, has several applications: classical formulations include Best-Arm Identification (BAI) in bandits, where actions index hypotheses, and generalized search problems, where strategically chosen queries reveal partial information about a hidden label. In many modern settings, however, the hypothesis space is continuous and naturally coincides with the query/action space: for example, identifying an optimal action in a continuous-armed bandit, localizing an $ε$-ball contained in a target region, or estimating the minimizer of an unknown function from a sequence of observations. In this work, we study pure exploration in such continuous spaces and introduce Continuous In-Context Pure Exploration for this regime. We introduce C-ICPE-TS, an algorithm that meta-trains deep neural policies to map observation histories to (i) the next continuous query action and (ii) a predicted hypothesis, thereby learning transferable sequential testing strategies directly from data. At inference time, C-ICPE-TS actively gathers evidence on previously unseen tasks and infers the true hypothesis without parameter updates or explicit hand-crafted information models. We validate C-ICPE-TS across a range of benchmarks, spanning continuous best-arm identification, region localization, and function minimizer identification.

Liked Liked