Is the “new” EBM architecture essentially just Model Predictive Control (MPC) in disguise?

digitado ⋅ 22 de January de 2026

We often treat current LLMs as massive “reflex” engines (Model-Free policies) – they react to the context based on training habits, but they don’t really “plan” or verify their moves unless we force them with Chain-of-Thought.

I’ve been looking at the architecture behind the new Logical Intelligence lab, which is pivoting to Energy-Based Models.

From an RL perspective, this feels exactly like the shift from “habit-based” action to “planning-based” control. Instead of just guessing the next token, the model actively tries to minimize an “energy” (or cost) function to make sure the output actually fits the rules.

The Sudoku Demo: They released a benchmark (https://sudoku.logicalintelligence.com/) where the model solves Sudoku. Sudoku is a perfect example of a sparse-reward environment where one wrong move ruins everything. The fact that the EBM solves it suggests it’s effectively doing a search or optimization at inference time, rather than just behavior cloning.

Do you see EBMs as a distinct new thing, or is this just Generative AI finally adopting standard RL planning techniques? If “Energy” is basically just “Negative Reward”, are we finally seeing the merger of LLMs and classical control theory?

submitted by /u/redpaul72
[link] [comments]

Like 0

Liked Liked