Hippotorch: Hippocampus-inspired episodic memory for sparse-reward problems
| |
 I’ve been working on a replay buffer replacement inspired by how the hippocampus consolidates memories during sleep. The problem: In sparse-reward tasks with long horizons (e.g., T-maze variants), the critical observation arrives at t=0 but the decision happens 30+ steps later. Uniform replay treats all transitions equally, so the rare successes get drowned out. The approach: Hippotorch uses a dual encoder to embed experiences, stores them in an episodic memory with semantic indices, and periodically runs a “sleep” phase that consolidates memories using reward-weighted contrastive learning (InfoNCE). At sampling time, it mixes semantic retrieval with uniform fallback. Results: On a 30-step corridor benchmark (7 seeds, 300 episodes), hybrid sampling beats uniform replay by ~20% on average. Variance is still high (some seeds underperform), this is a known limitation we’re working on. Links:
The components are PyTorch modules you can integrate into your own policies. Main knobs are consolidation frequency and the semantic/uniform mixture ratio. Would love feedback, especially from anyone working on long-horizon credit assignment. Curious if anyone has tried similar approaches or sees obvious failure modes I’m missing. submitted by /u/Temporary-Oven6788 |