Hippotorch: Hippocampus-inspired episodic memory for sparse-reward problems

digitado ⋅ 19 de January de 2026

![img](socqna2mb7eg1)

I’ve been working on a replay buffer replacement inspired by how the hippocampus consolidates memories during sleep.

The problem: In sparse-reward tasks with long horizons (e.g., T-maze variants), the critical observation arrives at t=0 but the decision happens 30+ steps later. Uniform replay treats all transitions equally, so the rare successes get drowned out.

The approach: Hippotorch uses a dual encoder to embed experiences, stores them in an episodic memory with semantic indices, and periodically runs a “sleep” phase that consolidates memories using reward-weighted contrastive learning (InfoNCE). At sampling time, it mixes semantic retrieval with uniform fallback.

Results: On a 30-step corridor benchmark (7 seeds, 300 episodes), hybrid sampling beats uniform replay by ~20% on average. Variance is still high (some seeds underperform), this is a known limitation we’re working on.

Links:

GitHub: https://github.com/domezsolt/hippotorch
Blog post with methodology + results: https://domezsolt.substack.com/p/hippotorch-teaching-rl-agents-to
pip install hippotorch

The components are PyTorch modules you can integrate into your own policies. Main knobs are consolidation frequency and the semantic/uniform mixture ratio.

Would love feedback, especially from anyone working on long-horizon credit assignment. Curious if anyone has tried similar approaches or sees obvious failure modes I’m missing.

submitted by /u/Temporary-Oven6788
[link] [comments]

Like 0

Liked Liked