Expert-Aided Causal Discovery of Ancestral Graphs
arXiv:2309.12032v4 Announce Type: replace-cross
Abstract: Causal discovery (CD) is an important component of many scientific applications, yet most techniques produce unreliable point estimates that often contradict expert knowledge. To mitigate this, recent research has focused on ex-ante incorporation of background knowledge into the CD process, typically under an unrealistic causal sufficiency assumption. When probing experts is costly (e.g., hidden behind expensive LLM APIs), however, ex-post model refinement that maximizes query utility is preferable. Also, when independent experts provide conflicting but better-than-random feedback, a principled aggregation method is required. In this context, we introduce the first CD algorithm that enables (i) distributional inference over ancestral graphs (AGs), which represent causal systems under latent confounding, and (ii) integration of both ex-ante and uncertain ex-post expert knowledge. Briefly, our method is a diversity-seeking reinforcement learning algorithm, termed Ancestral GFlowNet (AGFN), whose policy we iteratively refine based on a Bayesian model of the noisy expert feedback. Importantly, we prove convergence to the true AG given sufficiently accurate responses. Through validation on synthetic and realistic datasets using simulated humans and LLMs, we show AGFN is competitive with or superior to strong baselines in terms of structural Hamming distance and Bayesian Information Criterion.