Built a reward-function debugger for RL. Looking for feedback from people.

digitado ⋅ 26 de June de 2026

While experimenting with GRPO training, I kept running into a problem that when reward increases, it becomes difficult to tell whether the policy is genuinely improving or simply exploiting the reward function. So I built a small library called rewardspy that wraps an existing reward function and continuously monitors indicators that often precede reward hacking.

It currently tracks things like:

rolling reward statistics
reward variance collapse
reward component imbalance
response length drift
reward slope changes
GRPO group collapse, etc

Check it out: https://github.com/AvAdiii/rewardspy

I’d love sm technical feedback.

submitted by /u/Oranoleo12
[link] [comments]

Like 0

Liked Liked