A tutorial about how to fix one of the most misunderstood strategies: Exploration vs Exploitation
In this tutorial:
- You will understand that Exploration vs Exploitation is not a button, it is not “epsilon“, but a real data collection strategy, which decides what the agent can learn and how good it can become.
- You will see why the training reward can lie to you, why an agent without exploration can look “better” on the graph, but actually be weaker in reality.
- You will learn where exploration actually occurs in an Markov Decision Process(MDP), not only in actions, but also in states and in the agent’s policy; and why this matters enormously.
- You will understand what exploiting a wrong policy means, how lock-in occurs, why exploiting too early can destroy learning, and what this looks like in practice.
- You will learn the different types of exploration in modern RL: epsilon, entropy, optimism, uncertainty, curiosity; and what each solves and where it falls short.
- You will learn to interpret data correctly: when reward means something, when it doesn’t, what entropy means, action diversity, state distribution and seed sensitivity.
- You will see everything in practice, in a FrozenLake + DQN case study, with three types of exploration: no exploration, large exploration and controlled exploration; and you will understand what is really happening and why.
submitted by /u/Capable-Carpenter443
[link] [comments]
Like
0
Liked
Liked