If you train RL agents seriously, where does your pipeline actually bottleneck?

I did my MEng at Imperial building a massively GPU parallelized sim for drone RL, thousands of episodes stepping at once on the GPU. The thing that surprised me most was that simulation throughput dominated almost everything, wall clock, iteration speed, and cost, far more than the algorithm work.

Now I want to know whether that is universal or just my niche. Genuine question to anyone running real RL training (robotics, embodied, games, whatever).

What is the single most expensive or time wasting part of your RL training pipeline right now?

A few things I am curious about.

– Is sim throughput your bottleneck, or is it something else (reward design, infra and orchestration, debugging, sim to real, GPU cost)?

– What is your stack, Isaac Gym or Lab, Brax, MuJoCo (MJX), Genesis, a custom engine?

– If you could wave a wand and make one part 10x faster or cheaper, which part?

– Roughly how much wall clock or money does a single training run eat?

Not selling anything. I am trying to understand where the real pain is before building anything. Happy to share what I learned making my drone sim fast. War stories welcome.

submitted by /u/Ok-Kaleidoscope2186
[link] [comments]

Liked Liked