[D] If reasoning requires optimization rather than generation, what does that mean for the scaling paradigm?
Been digging into the architectural differences between autoregressive LLMs and Energy-Based Models (EBMs) for reasoning tasks, especially with LeCun’s recent push towards optimization-based architectures. The premise is that true reasoning should be an optimization problem (finding a state that minimizes an energy function satisfying constraints), rather than next-token prediction.
If reasoning inherently requires this optimization loop, does brute-force scaling of autoregressive models hit a hard wall regardless of compute? EBMs are computationally heavier per output during inference, but potentially bypass hallucinations by design.
Mathematically, do you see reasoning requiring something beyond autoregressive prediction, or can LLMs approximate optimization if scaled enough?
submitted by /u/Effective-Addition44
[link] [comments]