[D] Is it possible to create a benchmark that can measure human-like intelligence?

So I just watched this wonderful talk from Francois Chollet about how the current benchmarks (in 2024) cannot capture the ability to generalize knowledge and to solve novel problems. So he created ARC-AGI which apparently can do that.

Then I went and checked how the latest Frontier models are doing on this benchmark, Gemini 3.1 Pro is doing very well on both ARC-AGI-1 and ARC-AGI-2. However, I have been using Gemini 3.1 Pro for the last few days, and even though it’s great, it doesn’t feel like the model has human-like intelligence. One would think that abstract generalization is a key to human intelligence, but maybe there’s more to it than that. Do you think it is possible to create a benchmark which if a model can pass we can confidently say it possesses human intelligence?

submitted by /u/samsarainfinity
[link] [comments]

Liked Liked