Variable Selection Using Relative Importance Rankings
arXiv:2509.10853v2 Announce Type: replace
Abstract: Although conceptually related, variable selection and relative importance (RI) analysis have been treated quite differently in the literature. While RI is typically used for post-hoc model explanation, this paper explores its potential for variable or feature ranking and filter-based selection before model creation. Specifically, we anticipate strong performance from the RI measures because they incorporate both direct and combined effects of predictors, addressing a key limitation of marginal correlation, which ignores dependencies among predictors. We implement and evaluate the RI-based variable ranking and selection methods, including a newly proposed RI measure, CRI.Z, with improved computational efficiency relative to conventional RI measures.
Through extensive simulations, we first demonstrate how the RI measures more accurately rank the variables than the marginal correlation, especially when there are suppressed or weak predictors. We then show that predictive models built on these rankings are highly competitive, often outperforming state-of-the-art linear-model methods such as the lasso and relaxed lasso. The proposed RI-based methods are particularly effective in challenging cases involving clusters of highly correlated predictors, a setting known to cause failures in many benchmark methods. The practical utility and efficiency of RI-based methods are further demonstrated through two high-dimensional gene expression datasets. Although lasso methods have dominated the recent literature on variable selection, our study reveals that the RI-based method is a powerful and competitive alternative. We believe these underutilized tools deserve greater attention in statistics and machine learning communities. The code is available at: https://github.com/tien-endotchang/RI-variable-selection.