Real User Instruction: Black-Box Instruction Authentication Middleware Against Indirect Prompt Injection
Large Language Model (LLM) agents are increasingly deployed to interact with untrusted external data, exposing them to Indirect Prompt Injection (IPI) attacks. While current black-box defenses (i.e., model-agnostic methods) such as “Sandwich Defense” and “Spotlighting” provide baseline protection, they remain brittle against adaptive attacks like Actor-Critic (where injections evolve to better evade LLM’s internal defense). In this paper, we introduce Real User Instruction (RUI), a lightweight, black-box middleware that enforces strict instruction-data separation without model fine-tuning. RUI operates on three novel mechanisms: (1) a Privileged Channel that encapsulates user instructions within a cryptographic-style schema; (2) Explicit Adversarial Identification, a cognitive forcing strategy that compels the model to detect and list potential injections before response generation; and (3) Dynamic Key Rotation, a moving target defense that re-encrypts the conversation state at every turn, rendering historical injection attempts obsolete. We evaluate RUI against a suite of adaptive attacks, including Context-Aware Injection, Token Obfuscation, and Delimitation Spoofing. Our experiments demonstrate that RUI reduces the Attack Success Rate (ASR) from 100% (undefended baseline) to less than 8.1% against cutting-edge adaptive attacks, while maintaining a Benign Performance Preservation (BPP) rate of over 88.8%. These findings suggest that RUI is an effective and practical solution for securing agentic workflows against sophisticated, context-aware adversaries.