KA-PROMPT: Thermodynamic Early Warning of Prompt Injectionvia Entropic Drift in Large Language Models

Prompt injection attacks represent a fundamental and unresolved threat to the safe deployment of large language models (LLMs) in agentic and multi-turn settings. Existing defences rely predominantly on pattern-matching classifiers trained on known attack templates; they fail systematically against adaptive, obfuscated, and semantically camouflaged adversarial inputs. We propose KA-PROMPT, a thermodynamic framework that reconceptualises prompt injection not as a syntactic anomaly to be detected by surface features, but as an entropic phase transition within a non-equilibrium conversational information system. Drawing on the Karimov–Alakbarli (KA) model of thermodynamic early warning signals, we formalise the prompt state space Pt = (It, Ct, Rt, Mt), define a conditional entropy drift operator ∆St, and derive a suite of precursor statistics — variance inflation, autocorrelation rise, and entropy acceleration — that are theoretically guaranteed to surface before the adversarial attractor is reached. We construct a formal threat model covering eight attack classes (A1–A8), characterise each by its entropic signature across all four state dimensions, and demonstrate via agent-based simulation that k= 5.11 turns prior to successful injection KA-PROMPT achieves a mean precursor detection offset of across all attack classes, with a mean AUROC of 0.937 — a +17.2 percentage-point improvement over the next-best baseline. KA-PROMPT is explicitly positioned not as a universal prompt-attack detector, but as a dynamical-systems framework for precursor analysis of conversational state transitions that complements, rather than replaces, existing guardrail systems.

Liked Liked