Comparative Performance of Deep Learning Models for Financial Statement Fraud Detection in an Imbalanced Classification Setting
Financial statement fraud continues to pose a significant challenge to audit effectiveness, investor confidence, and the integrity of financial markets. Fraud detection is particularly complex due to the highly imbalanced nature of financial reporting data, where fraudulent observations constitute only a small fraction of the total sample. In such settings, conventional accuracy-based evaluation often produces misleading conclusions and fails to reflect practical audit value. This study conducts a comparative evaluation of four deep learning models, namely LSTM, GRU, CNN1D, and Transformer, for financial statement fraud detection under class-imbalanced conditions. The analysis is based on a dataset of 805 firm-year observations. It adopts Precision–Recall Area Under the Curve as the primary performance metric, complemented by ROC-AUC, Precision, Recall, F1 score, and Specificity. To assess practical usability, Decision Curve Analysis is employed to evaluate the decision-level net benefit of each model across different threshold probabilities, and bootstrap resampling is used to assess performance stability under random data partitioning. The empirical results show that the Transformer model consistently outperforms the other architectures in terms of discriminative ability, robustness, and decision-level utility. Its attention-based structure enables effective modeling of global relationships among financial indicators, leading to stable performance across varying thresholds and data splits. The CNN1D model demonstrates relatively high specificity and a balanced error structure, suggesting its suitability in audit environments where minimizing false positives and controlling verification costs are critical. In contrast, although the LSTM and GRU models exhibit higher sensitivity to fraudulent cases, their lower precision and stability limit their effectiveness as standalone solutions. Overall, the findings emphasize the importance of imbalance-aware, decision-oriented evaluation frameworks for detecting financial statement fraud. The study offers practical insights for auditors and regulators by identifying deep learning models that combine statistical reliability with operational relevance in real-world auditing contexts.