Built an RL framework for training LLMs where you can actually understand what is going on!
RL is a weird creature. It is hard to make work, and even when the implementation looks correct, training can still go sideways for some random reason.
Training LLMs with RL makes this even messier. Now you have the RL algorithm, distributed training, rollout engines, reward computation, weight syncing, orchestration, and a bunch of small implementation details that can quietly break everything.
That was the motivation behind FeynRL (pronounced “FineRL”), a framework I built and recently released.
The main idea is simple: algorithms should stay algorithms, systems should stay systems, and you should still be able to train large models from a single GPU to multi-GPU or cluster of GPUs.
I tried to make the code easy to follow end-to-end, from loading the data to rollout generation to the actual training loop. I also included a lot of practical RL post-training tricks that are usually scattered across papers, repos, or only few people know about them.
Links:
GitHub: https://github.com/FeynRL-project/FeynRL
Blog: https://feynrl-project.github.io/blogs/episode_one.html
Examples: https://github.com/FeynRL-project/FeynRL/tree/main/examples
Would love to hear feedback. And if you find it useful, a GitHub star would be appreciated.
submitted by /u/summerday10
[link] [comments]