Oak: A Python package for high performance RL in Pokemon RBY OU
Tutorial (WIP)
I’ve written a program suite and python library that combines an ultra fast simulator with a small Stockfish style neural networks (with policy priors) to attack perfect-information search in the first generation of Pokemon battling.
The goal of this library is to train a network and optimize search hyper-parameters that together will serve as the evalation function for an Information-Set MCTS approach to the full game. It is simple, at this point in development, to swap the eval in Foul-Play – the strongest 6v6 Singles AI.
It includes the following programs:
generate
Self-play data generation that saves multiple value and policy targets in an efficient serialized format
vs
A tool for comparing two eval/search parameters in a head to head
chall
A CLI for analyzing arbitrary positions
battle
Train value/policy networks.
build
Train team-building networks
evo
Search hyper-parameter optimization using evolution
rl
Reinforcement learning using generate/battle/build simultaneously
I will answer questions in the comments. It’s all very fast and you can train a SOTA eval in a few hours on a laptop. It just needs users xd
submitted by /u/open_cover_dev
[link] [comments]