Comparative Study of Natural Language Processing Models for Malware Detection Using API Call Sequences
In the evolving landscape of cybersecurity, the manual and time-consuming process of identifying malware remains a major bottleneck in security analysis. This study presents a novel approach to addressing this challenge by leveraging Natural Language Processing (NLP) techniques. This research focuses on a comparative analysis of two neural networks—a Long Short-Term Memory (LSTM) model and a Transformer model that analyze API call sequences and capture the relationships between API calls. Using a publicly available dataset, the models perform binary malware detection (malicious vs. benign). The experimental findings demonstrate that the NLP-based paradigm is highly effective. The Transformer model consistently and significantly outperformed the LSTM model, achieving 95.54% accuracy in distinguishing malware from benign samples. The success of the Transformer highlights the advantage of the attention mechanism in capturing long-range dependencies and deciphering complex malicious patterns from behavioral sequences. By representing system-level API calls as a linguistic structure, this approach establishes an efficient and dynamic framework for malware detection, aiding in cybersecurity threat response.