External normalization makes a big difference for Autostep on real-world data
I’m a D.Eng. student working through Step 1 of the Alberta Plan, implementing IDBD and Autostep in JAX. I believe I’ve run into an interesting finding while testing Autostep on SSH honeypot data.
My tests: I’ve been running the algorithms against observations from an SSH Cowrie honeypot. The features I extract from the log data span about 8 orders of magnitude (everything from binary flags to byte counts in the millions).
What I found: Autostep’s internal normalization handles a lot, but it wasn’t enough for the scale shocks in my data. During a coordinated botnet surge, the variance shifts caused instability. Adding an external OnlineNormalizer (just running mean/variance standardization) dropped MAE from 11.01 to 0.73.
IDBD fared worse (as expected), it diverged within the first few hundred observations even with normalization. Autostep stayed stable through all ~300k observations either way, but the normalized version performed 15x better.
Why I’m posting: The Alberta Plan actually mentions that online normalization for these meta-learning algorithms hasn’t been formally tested and published yet. I’m not claiming this is groundbreaking, it’s probably expected but I figured empirical results on real-world data might be useful to others working on similar problems.
Full writeup with learning curves and experimental details: https://blog.9600baud.net/autostep-normalization.html
The code implementing the algorithms and online normalization is in my [alberta-framework](https://github.com/j-klawson/alberta-framework).
Curious if this work has been done with adaptive step-size methods on production, non-stationarity data, or if there are better normalization approaches I should look at.
submitted by /u/debian_grey_beard
[link] [comments]