From A/B to RL: A gentle bridge from A/B testing to reinforcement learning
I created a 3-part series called From A/B to RL. The goal is to start from A/B testing ideas and gradually introduce actions, rewards, policies, online learning, states, episodes, and delayed feedback, with a Bayesian decision-making thread running through it:
- Part 1 starts with Bayesian A/B testing: From A/B to RL (1/3): Bayesian A/B Testing
- Part 2 moves from fixed experiments to online learning: multi-armed bandits, probability matching, and Thompson sampling: From A/B to RL (2/3): Multi-Armed Bandits
- Part 3 adds state-dependent policies and delayed rewards using MENACE/tic-tac-toe: From A/B to RL (3/3): Continuous Learning to Delayed Rewards
The posts came out of some old Jupyter notebook drafts from when I was teaching myself reinforcement learning. I finally cleaned them up into a more coherent series.
Feedback is welcome.
submitted by /u/Xochipilli
[link] [comments]
Like
0
Liked
Liked