Proof-of-Exploit: Cryptographically Verified LLM Cybersecurity Evaluation via Tiered Risk Metrics in the Operational-Risk Framework
Existing Large Language Model cybersecurity evaluations rely on text-based plausibility scoring systems that fail to validate operational exploit viability. In this paper we present the Operational Risk Framework (ORF), advancing beyond our prior MalcodeEval work through three (3) innovations: 1.) ECDSA-P384 cryptographic execution validation providing non-repudiable proof-of-exploit, 2.) MITRE ATT&CK-aligned tiered scoring with CVSS v4.0-derived severity weights, 3.) and six-phase progressive validation tracking 217 Indicators of Compromise within isolated VM environments.The utility of this framework is demonstrated through detailed case studies that have revealed granular disparities in capabilities and multi-stage attack progression, often obscured by standard pass/fail binary metrics. This work contributes systematic LLM-to-CVSS mapping and open cryptographic protocols toward NIST AI RMF 2.0 development.