Is my GRPO LLM training on my ETL-Doctor-Pipeline-Env working?

Is my GRPO LLM training on my ETL-Doctor-Pipeline-Env working?

https://preview.redd.it/hg6sw1ps6qwg1.png?width=897&format=png&auto=webp&s=ffbc86307eb7f8ab88a7fbb132cd69c20fe62c33

I am training Qwen3-0.6B on an RL environment made specifically for llms which I made myself. Feeling lost and confused. Here is the HF space link: https://huggingface.co/spaces/Atharva1232/etl_pipeline_doctor and here’s the github: https://github.com/Its-Atharva-Gupta/EPL-Pipeline-Doctor-Env I did use claude code for making the environment, since this is for a hackathon and the time limit is really short. Is my training going well or do I refactor something?

submitted by /u/Full_Promotion4522
[link] [comments]

Liked Liked