Limits of Self-Correction in LLMs: An Information-Theoretic Analysis of Correlated Errors

digitado ⋅ 13 de January de 2026

Recent empirical work shows that large language models struggle to self-correct reasoning without external feedback. We propose a possible explanation: correlated error between generator and evaluator. When both components share failure modes, self-evaluation may provide weak evidence of correctness, and repeated self-critique may amplify confidence without adding information. We formalize this with two information-theoretic bounds. We then describe a practical architecture pairing high-entropy proposal generation with low-entropy external selection. This suggests an alternative to extended chain-of-thought in a single context: separate generation from evaluation using fresh context, restoring the external feedback loop that human reasoning relies on. Importantly, this can be implemented with the same model, reducing error correlation without requiring additional computational cost. The architecture does not replace human judgment; it provides a filter that surfaces candidates surviving external scrutiny for human review.

Like 0

Liked Liked