B testing to reinforcement learning

digitado ⋅ 24 de June de 2026

I created a 3-part series called From A/B to RL. The goal is to start from A/B testing ideas and gradually introduce actions, rewards, policies, online learning, states, episodes, and delayed feedback, with a Bayesian decision-making thread running through it:

Part 1 starts with Bayesian A/B testing: From A/B to RL (1/3): Bayesian A/B Testing
Part 2 moves from fixed experiments to online learning: multi-armed bandits, probability matching, and Thompson sampling: From A/B to RL (2/3): Multi-Armed Bandits
Part 3 adds state-dependent policies and delayed rewards using MENACE/tic-tac-toe: From A/B to RL (3/3): Continuous Learning to Delayed Rewards

The posts came out of some old Jupyter notebook drafts from when I was teaching myself reinforcement learning. I finally cleaned them up into a more coherent series.

Feedback is welcome.

submitted by /u/Xochipilli
[link] [comments]

Like 0

Liked Liked