EvoPPO: Modular Vision & Audio Reinforcement Learning Framework
EvoPPO: Modular Vision & Audio Reinforcement Learning Framework
A highly scalable, multi-modal Reinforcement Learning (RL) framework built in Python. This repository provides a complete pipeline to train Proximal Policy Optimization (PPO) agents using decoupled vision (RGB/Grayscale) and audio inputs. The entire training process is managed via an intuitive, real-time local web interface.
Key Features
- Multi-Modal Inputs: Seamlessly train agents using visual data, acoustic data, or a combination of both.
- Dynamic Vision Toggle: Switch instantly between full RGB color processing and memory-efficient Grayscale mode.
- Integrated Audio Processing: Process environment audio streams alongside visual states for complex multi-sensory tasks.
- Local Web Dashboard: A built-in web interface running on
localhost:2000for complete, real-time orchestration. - Live Hyperparameter Tweaking: Modify variables, toggle input streams, and adjust reward functions on-the-fly without restarting the training loop.
- On-Premises Execution: Highly optimized for running local training workloads directly on your hardware.
System Architecture
The project consists of two core layers that communicate asynchronously:
- The RL Engine (Python): Handles the PPO training loop, environment interaction, replay buffer management, and tensor computations.
- The Control Dashboard (Port 2000): A lightweight web server providing a visual interface to monitor metrics and send real-time configuration changes back to the training loop.
Dashboard & Configuration
Through the interface at http://localhost:2000, users can monitor training performance and dynamically adjust parameters during runtime:
- Input Streams: Toggle
Vision (RGB),Vision (Grayscale), andAudiofields dynamically. - Reward Sculpting: Tweak reward multipliers and live-update the reward function setup.
- Training State: Start, pause, or save model weights instantly via UI buttons.
Roadmap
- Implement advanced vectorization for parallel environment processing.
- Integrate Recurrent PPO (LSTM/GRU layers) for enhanced audio-sequence memory.
- Cloud Scalability: Migrate from purely local training to a cloud-based server infrastructure for distributed GPU workloads.
submitted by /u/Fang310
[link] [comments]
Like
0
Liked
Liked