[P] Jerry Thomas — time-series pipeline runtime w/ stage-by-stage observability
Hi all,
I built an open-source time-series pipeline runtime (jerry-thomas).
It focuses on the time consuming part of ML time-series prep: combining multiple sources, aligning in time, cleaning, transforming, and producing model-ready vectors reproducibly.
The runtime is iterator-first (streaming), so it avoids loading full datasets into memory. It uses a contract-driven structure (DTO -> domain -> feature/vector), so you can swap sources by updating DTO/parser/mapper boundaries while keeping core pipeline operations on domain models.
It also emphasizes observability, with 8 inspectable output stages for debugging and validation.
There’s plugin scaffolding for custom loaders/parsers/transforms, plus a demo package to get started quickly. Outputs support multiple formats, and there are built-in integrations for ML workflows (including PyTorch datasets).
Versioning story: tag project config + plugin code in Git, and pair with a data versioning tool (for example DVC) for raw sources. With those inputs pinned, interim datasets and artifacts can be regenerated rather than stored.
I’d appreciate feedback from people who’ve built similar pipelines, or anyone willing to try the docs and share where setup is unclear.
EDIT: The links are in comments since I was not allowed to post with them by reddit filters for some reason
submitted by /u/Cold_Committee_7252
[link] [comments]