Eval-Driven Memory (EDM): A Persistence Governance Layer for Reliable Agentic AI via Metric-Guided Selective Consolidation

Reliable agentic AI requires not only accurate reasoning and adaptive control, but also mechanisms that preserve reliability over time. While recent work has introduced system-level evaluation frameworks (e.g., HB-Eval) and real-time control architectures (e.g., Adapt-Plan), the question of how reliability is retained across an agent’s operational lifespan remains largely unaddressed. Existing memory mechanisms typically store experiences based on recency or salience, inadvertently allowing low-quality behaviors to accumulate and degrade long-term performance.This paper introduces Evaluation-Driven Memory (EDM), a persistence governance layer that regulates long-term memory through certified evaluation metrics. EDM enforces selective consolidation, persisting only those trajectories that satisfy predefined reliability thresholds (e.g., Planning Efficiency Index, Trust Index), thereby preventing reliability regression. Conceptually, EDM reframes memory from a passive data store into an active governance mechanism situated between episodic execution and long-term knowledge accumulation.Empirical results demonstrate that EDM retains 50% fewer experiences while achieving 2× higher memory precision, reduces reasoning burden by 25% (CER=0.75text{CER}=0.75CER=0.75), and maintains long-term stability (MRS=0.08text{MRS}=0.08MRS=0.08) across repeated operational cycles. In contrast, flat memory architectures exhibit reliability degradation and increased cognitive load. We further position EDM within a coherent three-layer architecture—Evaluation (HB-Eval), Control (Adapt-Plan), and Persistence (EDM)—forming a closed trust loop for reliable agentic AI.These findings establish persistence governance as a necessary architectural principle for cumulative reliability, with implications for safety-critical systems, multi-agent collaboration, and human-AI interaction.

Liked Liked