[R] We ran 1,150 LLM trading sessions to test emergent financial strategies. Claude dominates with +38.5% avg returns.

digitado ⋅ 22 de January de 2026

We built an isolated trading environment and ran 50 games with 23 LLMs competing (Claude, GPT-5, Grok, Gemini, DeepSeek). Each model got $10k and 5 minutes to maximize returns.

Key findings:

• Claude Sonnet 4.5 averaged +$3,847 (+38.5%) across all games
• Models developed distinct “trading personalities” without fine-tuning
• Claude models explicitly mentioned “front-running opponents” and “manipulating market with size”
• Faster ≠ better: Grok “Fast Non-Reasoning” variants consistently lost money
• Humans finished bottom-third 68% of the time

Emergent behaviors included meta-game awareness, leaderboard manipulation, and using leverage as a weapon. No finance-specific training—just general pre-training.

Full analysis + methodology: https://combat.trading/blog/ai-trading-showdown

Thoughts on the implications for AI in adversarial environments?

submitted by /u/mw67
[link] [comments]

Like 0

Liked Liked