Building Self-Correcting RAG Systems
Author(s): Kushal Banda Originally published on Towards AI. Self-correcting RAG systems Standard RAG pipelines have a fatal flaw: they retrieve once and hope for the best. When the retrieved documents don’t match the user’s intent, the system generates confident nonsense. No feedback loop. No self-correction. No second chances. Agentic RAG changes this. Instead of blindly generating answers from whatever documents come back, an agent evaluates relevance first. If the retrieved content doesn’t cut it, the system rewrites the query and tries again. This creates a self-healing retrieval pipeline that handles edge cases gracefully. This article walks through building a production-grade Agentic RAG system using LangGraph for orchestration and Redis as the vector store. We’ll cover the architecture, the decision logic, and the state machine wiring that makes it all work. The problem with “retrieve and pray” Picture this: your knowledge base contains detailed documentation titled “Parameter-Efficient Training Methods for Large Language Models.” A user asks, “What’s the best way to fine-tune LLMs?” The semantic similarity is there, but it’s not strong enough. Your retriever pulls back tangentially related chunks about model architecture instead. The LLM, having no way to know the context is wrong, generates a plausible-sounding but incorrect answer. The user loses trust. Your RAG system looks broken. Traditional RAG has no mechanism to detect this failure mode. It treats retrieval as a one-shot operation: query in, documents out, answer generated. Done. Agentic RAG introduces checkpoints. An agent decides whether to retrieve at all. A grading step evaluates whether retrieved documents are relevant. A rewrite step reformulates failed queries. The system loops until it gets relevant context or exhausts its retry budget. Architectural Flow The system breaks down into six distinct components, each with a single responsibility: Configuration layer handles environment variables and API client setup. Redis connection strings, OpenAI keys, model names; all centralized in one place. Retriever setup downloads source documents (in this case, Lilian Weng’s blog posts on agents), splits them into chunks, embeds them with OpenAI’s embedding model, and stores everything in Redis via RedisVectorStore. The retriever then gets wrapped as a tool the agent can call. Agent node receives the user’s question and makes the first decision: should I call the retriever tool, or can I answer this directly? If the question requires external knowledge, the agent invokes retrieval. Grade edge evaluates whether retrieved documents are relevant to the original question. This is the critical checkpoint. Relevant documents flow to generation. Irrelevant documents trigger a rewrite. Rewrite node transforms the original question into a better search query. The user’s phrasing was too colloquial. Key terms were missing. The rewriter reformulates and sends the new query back to the agent for another retrieval attempt. Generate node takes relevant documents and produces the final answer. This only runs after the grading step confirms the context is appropriate. The decision flow Here’s how a query moves through the system: User Question ↓ Agent ─────────────────────────────────┐ ↓ │ [Calls retriever tool] │ ↓ │ Retrieve documents │ ↓ │ Grade documents │ ↓ │ ┌─────────────────┐ │ │ Relevant? │ │ └────────┬────────┘ │ │ │ Yes │ No │ │ │ │ ↓ └────→ Rewrite query ──────┘ Generate ↓ Answer The feedback loop from “Rewrite” back to “Agent” is what makes this agentic. The system doesn’t fail silently; it adapts and retries. Project structure The codebase follows a clean separation of concerns: src/├── config/│ ├── settings.py # Environment variables│ └── openai.py # Model names and API clients├── retriever.py # Document ingestion and Redis vector store├── agents/│ ├── nodes.py # Agent, rewrite, and generate functions│ ├── edges.py # Document grading logic│ └── graph.py # LangGraph state machine└── main.py # Entry point Each file does one thing. Configuration stays in config/. Agent logic stays in agents/. The retriever handles all vector store operations. This makes testing and debugging straightforward. Configuration: centralizing secrets and clients The configuration layer serves two purposes: loading environment variables and providing consistent API clients across the codebase. settings.py loads Redis connection strings, OpenAI API keys, and the index name from environment variables. All configuration lives here, not scattered across files. openai.py creates the embedding model and LLM client instances. When you need to switch from gpt-4o-mini to a different model, you change one file. When you need to adjust embedding dimensions, you change one file. No hunting through the codebase. This pattern matters more than it seems. Production systems evolve. Models get deprecated. API keys rotate. Centralizing configuration means these changes happen in one place. Retriever: building the knowledge base with Redis The retriever handles the ingestion pipeline: fetching documents, splitting them into chunks, generating embeddings, and storing everything in Redis for fast similarity search. The source documents are Lilian Weng’s blog posts on agents and prompt engineering. These get loaded via WebBaseLoader, split into manageable chunks using RecursiveCharacterTextSplitter, and embedded with OpenAI’s embedding model. Redis stores the vectors via RedisVectorStore. The retriever gets wrapped as a LangChain tool using create_retriever_tool. This wrapping is important: it lets the agent call retrieval as a tool, which means the agent can decide whether to retrieve at all. Why Redis? Speed and simplicity. Redis handles vector similarity search without the operational overhead of a dedicated vector database. For systems that already run Redis, this adds RAG capabilities without new infrastructure. Agent nodes: the decision makers Three functions in nodes.py handle the core logic: The agent function receives the current state (including the user’s question and any message history) and decides what to do next. It has access to tools, including the retriever. If the question requires external knowledge, the agent calls the retriever tool. If not, it answers directly. The rewrite function takes a question that failed retrieval grading and reformulates it. The rewriter prompts the LLM to generate a better search query; one that’s more likely to pull back relevant documents. This reformulated question gets passed back to the agent for another attempt. The generate function produces the final answer. It receives the original question and the relevant documents […]