[Project Showcase] ML-Agents in Python through TorchRL

[Project Showcase] ML-Agents in Python through TorchRL

Hi everyone,

I wanted to share a project I’ve been working on: ML-Agents with TorchRL. This is my first project I’ve tried to make presentable so I would really appreciate feedback on it.

https://reddit.com/link/1q15ykj/video/u8zvsyfi2rag1/player

Summary

Train Unity environments using TorchRL. This bypasses the default mlagents-learn CLI with torchrl templates that are powerful, modular, debuggable, and easy to customize.

Motivation

  • The default ML-Agents trainer is not easy to customize for me, it felt like a black box if you wanted to implement custom algorithms or research ideas. I wanted to combine the high-fidelity environments of Unity with the composability of PyTorch/TorchRL.

TorchRL Algorithms

The nice thing about torchrl is that once you have the environments in the right format you can use their powerful modular parts to construct an algorithm.

For example, one really convenient component for PPO is the MultiSyncDataCollector which uses multiprocessing to collect data in parallel:

collector = MultiSyncDataCollector( [create_env]*WORKERS, policy, frames_per_batch=..., total_frames=-1, ) data = collector.next() 

This is then combined with many other modular parts like replay buffers, value estimators (GAE), and loss modules.

This makes setting up an algorithm both very straightforward and highly customizable. Here’s an example of PPO. To introduce a new algorithm or variant just create another training template.

Python Workflow

Working in python is also really nice. For example I set up a simple experiment runner using hydra which takes in a config like configs/crawler_ppo.yaml. Configs look something like this:

defaults: - env: crawler algo: name: ppo _target_: runners.ppo.PPORunner params: epsilon: 0.2 gamma: 0.99 trainer: _target_: rlkit.templates.PPOBasic params: generations: 5000 workers: 8 model: _target_: rlkit.models.MLP params: in_features: "${env.observation.dim}" out_features: "${env.action.dim}" n_blocks: 1 hidden_dim: 128 ... 

It’s also integrated with a lot of common utility like tensorboard and huggingface (logs/checkpoints/models). Which makes it really nice to work with at a user level even if you don’t care about customizability.

https://preview.redd.it/x39oemq74rag1.png?width=2032&format=png&auto=webp&s=929a685a5de03510ea781fa4669b082b4eb6ad5e

Discussion

I think having this torchrl trainer option can make unity more accessible for research or just be an overall direction to expand the trainer stack with more features.

I’m going to continue working on this project and I would really appreciate discussion, feedback (I’m new to making these sort of things), and contributions.

submitted by /u/TaskBeneficial380
[link] [comments]

Liked Liked