Our SNN-controlled robot dog learned to find and touch a ball using Free Energy, not reward shaping — 4.3cm minimum distance, 47 contact frames

I’m building MH-FLOCKE, an open embodied AI framework that replaces standard RL with biologically grounded learning. The goal: a reusable platform where spiking neural networks, cerebellar models, and predictive coding work together — not as isolated papers but as one integrated system.

The robot is a Unitree Go2 in MuJoCo, controlled by a spiking neural network (4,624 Izhikevich neurons, 93k synapses) with a Marr-Albus-Ito cerebellar forward model.

The learning signal is not a shaped reward. Instead:

  1. Task-specific Prediction Error (Free Energy): Ball close = negative PE (calm), ball far = positive PE (chaos). The global world model PE was 0.004 — noise. Task PE gives ±1.74.

  2. Vision stimulation: When failing, the 16 vision input neurons get extra current. The SNN can’t ignore the ball.

  3. Curriculum: Ball starts directly ahead. No steering needed first.

  4. Brain persistence: Episodic memory + knowledge graph saves across runs. The dog doesn’t start from scratch.

Results: 5 episodes, all with physical contact. Min distance 4.3cm. Ball displaced 83cm.

What doesn’t work: lateral steering, speed control, persistent pathways after 50k steps.

The framework currently has 65 cognitive modules (SNN, cerebellum, CPG, drives, episodic memory, dream consolidation, synaptogenesis, neuromodulation, etc.). I’m working toward making this available as an open platform for anyone who wants to build on biologically grounded robotics instead of pure RL.

Video: https://www.youtube.com/watch?v=7Dn9bKZ8zSc Paper: https://aixiv.science/abs/aixiv.260301.000002

Has anyone else tried task-specific PE instead of reward shaping for navigation?

submitted by /u/mhflocke
[link] [comments]

Liked Liked