EvoPPO: Modular Vision & Audio Reinforcement Learning Framework

digitado ⋅ 1 de June de 2026

EvoPPO: Modular Vision & Audio Reinforcement Learning Framework

A highly scalable, multi-modal Reinforcement Learning (RL) framework built in Python. This repository provides a complete pipeline to train Proximal Policy Optimization (PPO) agents using decoupled vision (RGB/Grayscale) and audio inputs. The entire training process is managed via an intuitive, real-time local web interface.

Key Features

Multi-Modal Inputs: Seamlessly train agents using visual data, acoustic data, or a combination of both.
Dynamic Vision Toggle: Switch instantly between full RGB color processing and memory-efficient Grayscale mode.
Integrated Audio Processing: Process environment audio streams alongside visual states for complex multi-sensory tasks.
Local Web Dashboard: A built-in web interface running on localhost:2000 for complete, real-time orchestration.
Live Hyperparameter Tweaking: Modify variables, toggle input streams, and adjust reward functions on-the-fly without restarting the training loop.
On-Premises Execution: Highly optimized for running local training workloads directly on your hardware.

System Architecture

The project consists of two core layers that communicate asynchronously:

The RL Engine (Python): Handles the PPO training loop, environment interaction, replay buffer management, and tensor computations.
The Control Dashboard (Port 2000): A lightweight web server providing a visual interface to monitor metrics and send real-time configuration changes back to the training loop.

Dashboard & Configuration

Through the interface at http://localhost:2000, users can monitor training performance and dynamically adjust parameters during runtime:

Input Streams: Toggle Vision (RGB), Vision (Grayscale), and Audio fields dynamically.
Reward Sculpting: Tweak reward multipliers and live-update the reward function setup.
Training State: Start, pause, or save model weights instantly via UI buttons.

Roadmap

Implement advanced vectorization for parallel environment processing.
Integrate Recurrent PPO (LSTM/GRU layers) for enhanced audio-sequence memory.
Cloud Scalability: Migrate from purely local training to a cloud-based server infrastructure for distributed GPU workloads.

submitted by /u/Fang310
[link] [comments]

Like 0

Liked Liked