[D] Quantified analysis of 2,218 Gary Marcus claims – two independent LLM pipelines, scored against evidence

digitado ⋅ 4 de March de 2026

Built a dataset scoring every testable claim from Marcus’s 474 Substack posts. Two pipelines (Claude Opus 4.6 and ChatGPT Codex) analyzed the corpus, then a reconciliation layer compared outputs.

52% supported, 34% mixed, 6.4% contradicted among assessable claims. Distribution is more interesting than the topline: specific technical observations (LLM security vulnerabilities, Sora quality, agent readiness) score 88-100% supported with zero contradictions. His bubble/scam predictions are the single worst cluster out of 54.

Falsifiability drives the split. Nearly a fifth of his claims can’t be proven wrong by any outcome. Those accumulate while his accurate calls resolve and disappear.

All LLM-scored, not human-verified. Full methodology and data in the repo. Built in a single session.

https://github.com/davegoldblatt/marcus-claims-dataset

submitted by /u/davegoldblatt
[link] [comments]

Like 0

Liked Liked