[P] PerpetualBooster: A new gradient boosting library that enables O(n) continual learning and out-performs AutoGluon on tabular benchmarks.
Hi everyone,
I’m part of the team that developed PerpetualBooster, a gradient boosting algorithm designed to solve the “forgetting” and “retraining” bottlenecks in traditional GBDT frameworks like XGBoost or LightGBM.
We’ve just launched a serverless cloud platform to operationalize it, but I wanted to share the underlying tech and how we’re handling the ML lifecycle for tabular data.
The main challenge with most GBDT implementations is that retraining on new data usually requires O(n^2) complexity over time. We’ve optimized our approach to support Continual Learning with O(n) complexity, allowing models to stay updated without full expensive recomputes.
In our internal benchmarks, it is currently outperforming AutoGluon in several tabular datasets regarding both accuracy and training efficiency: https://github.com/perpetual-ml/perpetual?tab=readme-ov-file#perpetualbooster-vs-autogluon
We’ve built a managed environment around this to remove the “Infra Tax” for small teams:
- Reactive Notebooks: We integrated Marimo as the primary IDE. It’s fully serverless, so you aren’t paying for idle kernels.
- Drift-Triggered Learning: We built-in automated data/concept drift monitoring that can natively trigger the O(n) continual learning tasks.
- Production Endpoints: Native serverless inference that scales to zero.
- Pipeline: Integrated data quality checks and a model registry that handles the transition from Marimo experiments to production APIs.
You can find PerpetualBooster on GitHub https://github.com/perpetual-ml/perpetual and pip.
If you want to try the managed environment (we’ve just moved it out of the Snowflake ecosystem to a standalone cloud), you can check it out here:https://app.perpetual-ml.com/signup
submitted by /u/mutlu_simsek
[link] [comments]