Have you tried doing some self-improvement for agents?
I’ve been trying to get to know more about self-improving agents and I’m specifically interested in systems where the agent detects it has failed, for instance with things like wrong tool calls, bad retrieval, hallucination, couldn’t find the right answer, and then automatically adjusts its own prompts or strategy so the same class of failure doesn’t happen again. I don’t talk about weight updates, but the prompt/instructions/orchestration logic evolving based on observed errors.
I’m aware of work in this space like Reflexion (verbal self-reinforcement from failures), APO (using LLM-generated “textual gradients” to edit prompts via beam search), ProTeGi (structured prompt optimization loops), MemAPO (dual-memory that accumulates successful strategies and failure signals to guide future prompt construction), AutoPDL (framing prompt + pattern selection as an AutoML problem with successive halving), Self-Challenging Agents (self-generated tasks with test-code as reward signal), the AGENTS/.md pattern for persistent repo-level memory, and Karpathy’s AutoResearch loop.
But I’m curious what else is out there, especially anything that closes the full loop: attempt → detect failure → diagnose root cause → rewrite prompt → persist the fix → verify no regression. Are there frameworks or production systems doing this well? How do you handle prompt drift where fixing one failure breaks something else? Is anyone combining this with RL-based reward signals (GRPO, PPO) rather than purely LLM-based self-reflection? Would love to hear what people are building or reading.
submitted by /u/Imaginary-Bee-8770
[link] [comments]