Is the “new” EBM architecture essentially just Model Predictive Control (MPC) in disguise?
We often treat current LLMs as massive “reflex” engines (Model-Free policies) – they react to the context based on training habits, but they don’t really “plan” or verify their moves unless we force them with Chain-of-Thought. I’ve been looking at the architecture behind the new Logical Intelligence lab, which is pivoting to Energy-Based Models. From an RL perspective, this feels exactly like the shift from “habit-based” action to “planning-based” control. Instead of just guessing the next token, the […]