We’ve been exploring Evolution Strategies as an alternative to RL for LLM fine-tuning — would love feedback
|
Performance of ES compared to established RL baselines across multiple math reasoning benchmarks. ES achieves competitive results, demonstrating strong generalization beyond the original proof-of-concept tasks. submitted by /u/Signal_Spirit5934 |
Like
0
Liked
Liked