arXiv Endorsement Request – cs.LG/cs.AI –Identified two optimization pathologies in Multi-Timescale PPO

Hey guys,

I am an undergrad researcher finalizing a preprint on multi-timescale temporal credit assignment, and I am looking for an arXiv endorsement for cs.LG (or cs.AI).

Title: Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO

TL;DR:

We investigated why dynamically routing multi-timescale advantages inside an Actor-Critic architecture often leads to policy collapse. We formally diagnosed two pathologies:

1.Surrogate Objective Hacking: Differentiable routing allows the PPO policy gradient to hijack attention weights, artificially minimizing the clipped surrogate loss while ignoring physical control.

2.Paradox of Temporal Uncertainty: Gradient-free routing via inverse-variance forces irreversible myopic degeneration, as Softmax disproportionately locks onto short-term horizons due to their naturally lower aleatoric uncertainty.

Solution: We propose “Target Decoupling”, isolating the Actor to the purest long-term advantage while maintaining multi-timescale predictions purely for the Critic’s auxiliary representation.

Code: I have prepared a strict Minimal Reproducible Example (MRE)—4 clean, standalone Python scripts (Standard MLPs only) that definitively reproduce the crashes and the final solution on LunarLander-v2.

Please check this link:

https://zenodo.org/records/19497907

(The GitHub repo is preparing).

If your expertise aligns and you find this diagnosis interesting, I would be incredibly grateful for an endorsement. Please leave a comment or DM me if you can help. Thank you!

submitted by /u/dlwlrma_22
[link] [comments]

Liked Liked