TransForge: A Genetic Algorithm Framework for Cross-Category Evaluation of Endpoint Detection Robustness to Code Transformations

Endpoint protection systems increasingly rely on a combination of signature-based and behavioral detection mechanisms, yet their robustness under systematic code transformation remains insufficiently understood. This paper presents a multi-category evaluation of endpoint detection robustness under automated, semantic-preserving code transformations across diverse execution variants. We introduce TransForge, a generalized transformation framework designed to generate functionally equivalent execution variants for controlled robustness assessment across heterogeneous artifact categories and programming environments. Building on our prior work, ShellForge, which focused on a single artifact class, TransForge extends this approach to support multi-category analysis through a modular transformation pipeline and an evolutionary strategy that enables non-deterministic variant generation. Using a dataset of 75 base samples spanning six execution categories and four programming languages, we conduct controlled experiments to evaluate how endpoint detection systems respond to systematically generated variants under consistent conditions. The findings reveal quantifiable variability in detection responses across categories and transformation strategies, highlighting coverage gaps in both signature-based and behavioral detection pipelines when faced with semantic-preserving transformations. This work motivates the development of robustness-aware evaluation frameworks and detection pipelines that leverage behavioral correlation and adaptive analysis beyond static signature matching.

Liked Liked