External normalization makes a big difference for Autostep on real-world data

I’m a D.Eng. student working through Step 1 of the Alberta Plan, implementing IDBD and Autostep in JAX. I believe I’ve run into an interesting finding while testing Autostep on SSH honeypot data.

My tests: I’ve been running the algorithms against observations from an SSH Cowrie honeypot. The features I extract from the log data span about 8 orders of magnitude (everything from binary flags to byte counts in the millions).

What I found: Autostep’s internal normalization handles a lot, but it wasn’t enough for the scale shocks in my data. During a coordinated botnet surge, the variance shifts caused instability. Adding an external OnlineNormalizer (just running mean/variance standardization) dropped MAE from 11.01 to 0.73.

IDBD fared worse (as expected), it diverged within the first few hundred observations even with normalization. Autostep stayed stable through all ~300k observations either way, but the normalized version performed 15x better.

Why I’m posting: The Alberta Plan actually mentions that online normalization for these meta-learning algorithms hasn’t been formally tested and published yet. I’m not claiming this is groundbreaking, it’s probably expected but I figured empirical results on real-world data might be useful to others working on similar problems.

Full writeup with learning curves and experimental details: https://blog.9600baud.net/autostep-normalization.html

The code implementing the algorithms and online normalization is in my [alberta-framework](https://github.com/j-klawson/alberta-framework).

Curious if this work has been done with adaptive step-size methods on production, non-stationarity data, or if there are better normalization approaches I should look at.

submitted by /u/debian_grey_beard
[link] [comments]

Liked Liked