Leakage-Free One-Year-Ahead Prediction of Corporate Tax Avoidance Proxy Measures in Korea
Since 2011, the mandatory adoption of Korean International Financial Reporting Standards (K-IFRS) by listed Korean firms has improved the consistency of financial reporting and enhanced comparability across firms and over time This institutional change has made it more feasible to construct long-horizon firm–year panel datasets and apply quantitative predictive analyses. The KoTaP dataset provides standardized firm–year panel data for Korean listed non-financial firms over 2011–2024, and this study empirically evaluates the feasibility of risk screening based on one-year-ahead (t→t+1) forecasting of tax-avoidance proxies (CETR, GETR, TSTA, TSDA) using KoTaP. Specifically, we define an ex-ante setting in which only information observable in year t is used to predict tax-avoidance indicators at t+1. We then propose a leakage-free evaluation protocol that enforces chronological splits and fits all preprocessing steps on the training data only. We further partition input features into raw and derived variables and compare three configurations Raw-only, Derived-only, and Raw+Derived to quantify the contribution of derived feature construction. Finally, we compare three machine-learning models and one deep-learning model under the same evaluation procedure and derive practical implications for model selection and deployment in terms of performance and stability.