**Title:** PowerShell implementations of DQN, PPO and A3C — faithful to the original papers, benchmarkable head to head

Sharing an unusual implementation — three RL algorithms in PowerShell 5.1,

all benchmarkable against each other on the same environments.

**Algorithms:**

– DQN (Mnih 2013/2015): experience replay, target network, epsilon-greedy

– PPO (Schulman 2017): GAE lambda=0.95, clip epsilon=0.2, entropy bonus

– A3C (Mnih 2016): shared actor-critic network, n-step returns, simulated workers

**Environments:**

– CartPole (standard), GridWorld (5×5), RandomWalk (1D sanity check)

**Benchmark all three:**

“`powershell

$dqn = (Invoke-DQNTraining -Episodes 100 -FastMode -Quiet)[-1]

$ppo = (Invoke-PPOTraining -Episodes 100 -FastMode -Quiet)[-1]

$a3c = (Invoke-A3CTraining -Episodes 100 -FastMode -Quiet)[-1]

$env = New-VBAFEnvironment -Name “CartPole”

Invoke-VBAFBenchmark -Agent $dqn -Environment $env -Episodes 20 -Label “DQN”

Invoke-VBAFBenchmark -Agent $ppo -Environment $env -Episodes 20 -Label “PPO”

Invoke-VBAFBenchmark -Agent $a3c -Environment $env -Episodes 20 -Label “A3C”

Invoke-VBAFBenchmark -Agent $null -Environment $env -Episodes 20 -Label “Random”

“`

**PS 5.1 note:** True async threading not available — A3C workers run

sequentially. Mathematically equivalent, no parallelism speedup.

Dependency injection used throughout (no cross-file type references at parse time).

Performance is slow vs Python — DQN takes ~2 minutes where PyTorch takes seconds.

For learning what the algorithm is doing step by step — the slow version teaches more.

GitHub: https://github.com/JupyterPS/VBAF

Curious if anyone has compared convergence behaviour against reference

Python implementations on CartPole.

submitted by /u/ChanceSwimming3976
[link] [comments]

Liked Liked