Data-driven configuration tuning of glmnet for balancing accuracy and computational efficiency
arXiv:2602.17922v2 Announce Type: replace-cross
Abstract: The glmnet package in R is widely used for lasso estimation because of its computational efficiency. Despite its popularity, glmnet occasionally yields solutions that deviate substantially from the true ones because of the inappropriate default configuration of the algorithm. The accuracy of the obtained solutions can be improved by appropriately tuning the configuration. However, such improvements typically increase computational time, resulting in a tradeoff between accuracy and computational efficiency. Therefore, a systematic approach is required to determine the appropriate configuration. To address this need, we propose a unified data-driven framework specifically designed to optimize the configuration by balancing solution path accuracy and computational cost. Specifically, we generate a large-scale training dataset by measuring the accuracy and computation time of glmnet. Using this dataset, we construct neural networks to predict accuracy and computation time from data characteristics and configuration. For a new dataset, the proposed framework uses the trained networks to explore the configuration space and derive a Pareto front that represents the tradeoff between accuracy and computational cost. This front enables automatic selection of the configuration that maximizes accuracy under a user-specified time constraint. The proposed method is implemented in the R package glmnetconf, available at https://github.com/Shuhei-Muroya/glmnetconf.git.