Validating “Streaming Deep RL Finally Works” on 433k Observations of Real Attack Traffic
I’m learning the foundations of RL in alignment with the Alberta Plan for AI research and have been running through sets of experiments to both learn and experiment. To that end I spent the last month validating different methods for streaming deep RL on a non-stationary, adversarial dataset of real SSH honeypot observations. This work focuses on prediction and is in line with steps 1 & 2 of the Alberta Plan (Sutton, Bowling, & Pilarski 2022). After implementing […]