EvoPPO: Modular Vision & Audio Reinforcement Learning Framework

EvoPPO: Modular Vision & Audio Reinforcement Learning Framework

A highly scalable, multi-modal Reinforcement Learning (RL) framework built in Python. This repository provides a complete pipeline to train Proximal Policy Optimization (PPO) agents using decoupled vision (RGB/Grayscale) and audio inputs. The entire training process is managed via an intuitive, real-time local web interface.

Key Features

  • Multi-Modal Inputs: Seamlessly train agents using visual data, acoustic data, or a combination of both.
  • Dynamic Vision Toggle: Switch instantly between full RGB color processing and memory-efficient Grayscale mode.
  • Integrated Audio Processing: Process environment audio streams alongside visual states for complex multi-sensory tasks.
  • Local Web Dashboard: A built-in web interface running on localhost:2000 for complete, real-time orchestration.
  • Live Hyperparameter Tweaking: Modify variables, toggle input streams, and adjust reward functions on-the-fly without restarting the training loop.
  • On-Premises Execution: Highly optimized for running local training workloads directly on your hardware.

System Architecture

The project consists of two core layers that communicate asynchronously:

  1. The RL Engine (Python): Handles the PPO training loop, environment interaction, replay buffer management, and tensor computations.
  2. The Control Dashboard (Port 2000): A lightweight web server providing a visual interface to monitor metrics and send real-time configuration changes back to the training loop.

Dashboard & Configuration

Through the interface at http://localhost:2000, users can monitor training performance and dynamically adjust parameters during runtime:

  • Input Streams: Toggle Vision (RGB), Vision (Grayscale), and Audio fields dynamically.
  • Reward Sculpting: Tweak reward multipliers and live-update the reward function setup.
  • Training State: Start, pause, or save model weights instantly via UI buttons.

Roadmap

  • Implement advanced vectorization for parallel environment processing.
  • Integrate Recurrent PPO (LSTM/GRU layers) for enhanced audio-sequence memory.
  • Cloud Scalability: Migrate from purely local training to a cloud-based server infrastructure for distributed GPU workloads.

submitted by /u/Fang310
[link] [comments]

Liked Liked