Real User Instruction: Black-Box Instruction Authentication Middleware Against Indirect Prompt Injection

digitado ⋅ 13 de March de 2026

Large Language Model (LLM) agents are increasingly deployed to interact with untrusted external data, exposing them to Indirect Prompt Injection (IPI) attacks. While current black-box defenses (i.e., model-agnostic methods) such as “Sandwich Defense” and “Spotlighting” provide baseline protection, they remain brittle against adaptive attacks like Actor-Critic (where injections evolve to better evade LLM’s internal defense). In this paper, we introduce Real User Instruction (RUI), a lightweight, black-box middleware that enforces strict instruction-data separation without model fine-tuning. RUI operates on three novel mechanisms: (1) a Privileged Channel that encapsulates user instructions within a cryptographic-style schema; (2) Explicit Adversarial Identification, a cognitive forcing strategy that compels the model to detect and list potential injections before response generation; and (3) Dynamic Key Rotation, a moving target defense that re-encrypts the conversation state at every turn, rendering historical injection attempts obsolete. We evaluate RUI against a suite of adaptive attacks, including Context-Aware Injection, Token Obfuscation, and Delimitation Spoofing. Our experiments demonstrate that RUI reduces the Attack Success Rate (ASR) from 100% (undefended baseline) to less than 8.1% against cutting-edge adaptive attacks, while maintaining a Benign Performance Preservation (BPP) rate of over 88.8%. These findings suggest that RUI is an effective and practical solution for securing agentic workflows against sophisticated, context-aware adversaries.

Like 0

Liked Liked