technocracy 7x Longer Context Reinforcement Learning in Unsloth digitado ⋅ 15 de January de 2026 submitted by /u/RecmacfonD [link] [comments] Like 0 Liked Liked → « NASA’s first medical evacuation from space ends with on-target splashdown » Struggling to get PPO to work for pickup & delivery task — stuck, need for guidance