[D] Quantified analysis of 2,218 Gary Marcus claims – two independent LLM pipelines, scored against evidence
Built a dataset scoring every testable claim from Marcus’s 474 Substack posts. Two pipelines (Claude Opus 4.6 and ChatGPT Codex) analyzed the corpus, then a reconciliation layer compared outputs.
52% supported, 34% mixed, 6.4% contradicted among assessable claims. Distribution is more interesting than the topline: specific technical observations (LLM security vulnerabilities, Sora quality, agent readiness) score 88-100% supported with zero contradictions. His bubble/scam predictions are the single worst cluster out of 54.
Falsifiability drives the split. Nearly a fifth of his claims can’t be proven wrong by any outcome. Those accumulate while his accurate calls resolve and disappear.
All LLM-scored, not human-verified. Full methodology and data in the repo. Built in a single session.
submitted by /u/davegoldblatt
[link] [comments]