We’ve been exploring Evolution Strategies as an alternative to RL for LLM fine-tuning — would love feedback

We’ve been exploring Evolution Strategies as an alternative to RL for LLM fine-tuning — would love feedback

Performance of ES compared to established RL baselines across multiple math reasoning benchmarks. ES achieves competitive results, demonstrating strong generalization beyond the original proof-of-concept tasks.

submitted by /u/Signal_Spirit5934
[link] [comments]

Liked Liked