[P] I trained an XGBoost model with DuckLake and ADBC
I’ve been spending time with Apache ADBC (Arrow Database Connectivity) and DuckLake (lakehouse architecture using DuckDB) to read columnar data. I realized XGBoost took Arrow tables as a data input and I was able to pass arrow tables with little memory overhead to train. I also wanted to try to not use scikit-learn so I built a train and test split function with PyArrow instead. ADBC also allows you to stream larger than memory data and train a model in the right circumstances.
submitted by /u/empty_cities
[link] [comments]
Like
0
Liked
Liked