**Title:** PowerShell implementations of DQN, PPO and A3C — faithful to the original papers, benchmarkable head to head
Sharing an unusual implementation — three RL algorithms in PowerShell 5.1,
all benchmarkable against each other on the same environments.
**Algorithms:**
– DQN (Mnih 2013/2015): experience replay, target network, epsilon-greedy
– PPO (Schulman 2017): GAE lambda=0.95, clip epsilon=0.2, entropy bonus
– A3C (Mnih 2016): shared actor-critic network, n-step returns, simulated workers
**Environments:**
– CartPole (standard), GridWorld (5×5), RandomWalk (1D sanity check)
**Benchmark all three:**
“`powershell
$dqn = (Invoke-DQNTraining -Episodes 100 -FastMode -Quiet)[-1]
$ppo = (Invoke-PPOTraining -Episodes 100 -FastMode -Quiet)[-1]
$a3c = (Invoke-A3CTraining -Episodes 100 -FastMode -Quiet)[-1]
$env = New-VBAFEnvironment -Name “CartPole”
Invoke-VBAFBenchmark -Agent $dqn -Environment $env -Episodes 20 -Label “DQN”
Invoke-VBAFBenchmark -Agent $ppo -Environment $env -Episodes 20 -Label “PPO”
Invoke-VBAFBenchmark -Agent $a3c -Environment $env -Episodes 20 -Label “A3C”
Invoke-VBAFBenchmark -Agent $null -Environment $env -Episodes 20 -Label “Random”
“`
**PS 5.1 note:** True async threading not available — A3C workers run
sequentially. Mathematically equivalent, no parallelism speedup.
Dependency injection used throughout (no cross-file type references at parse time).
Performance is slow vs Python — DQN takes ~2 minutes where PyTorch takes seconds.
For learning what the algorithm is doing step by step — the slow version teaches more.
GitHub: https://github.com/JupyterPS/VBAF
Curious if anyone has compared convergence behaviour against reference
Python implementations on CartPole.
submitted by /u/ChanceSwimming3976
[link] [comments]