Bayesian Joint Additive Factor Models for Multiview Learning

arXiv:2406.00778v4 Announce Type: replace
Abstract: It is increasingly common to collect data of multiple different types on the same set of samples. Our focus is on studying relationships between such multiview features and responses. A motivating application arises in the context of precision medicine where multi-omics data are collected to correlate with clinical outcomes. It is of interest to infer dependence within and across views while combining multimodal information to improve the prediction of outcomes. The signal-to-noise ratio can vary substantially across views, motivating more nuanced statistical tools beyond standard late and early fusion. This challenge comes with the need to preserve interpretability, select features, and obtain accurate uncertainty quantification. To address these challenges, we introduce two complementary factor regression models. A baseline Joint Factor Regression (textsc{jfr}) captures combined variation across views via a single factor set, and a more nuanced Joint Additive FActor Regression (textsc{jafar}) that decomposes variation into shared and view-specific components. For textsc{jfr}, we use independent cumulative shrinkage process (textsc{i-cusp}) priors, while for textsc{jafar} we develop a dependent version (textsc{d-cusp}) designed to ensure identifiability of the components. We develop Gibbs samplers that exploit the model structure and accommodate flexible feature and outcome distributions. Prediction of time-to-labor onset from immunome, metabolome, and proteome data illustrates performance gains against state-of-the-art competitors. Our open-source software (texttt{R} package) is available at https://github.com/niccoloanceschi/jafar.

Liked Liked