Fidel-TS: A High-Fidelity Multimodal Benchmark for Time Series Forecasting

arXiv:2509.24789v3 Announce Type: replace-cross
Abstract: The evaluation of time series forecasting models is hindered by a critical lack of high-quality benchmarks, leading to a potential illusion of progress. Existing datasets suffer from issues ranging from pre-training data contamination in the age of LLMs to the temporal and description leakage prevalent in early multimodal designs. To address this, we formalize the core principles of high-fidelity benchmarking, focusing on data sourcing integrity, leak-free and causally sound design, and structural clarity. We introduce Fidel-TS, a new large-scale benchmark built from the ground up on these principles by sourcing data from live APIs. Our experiments reveal the flaws of the previous benchmarks and the biases in model evaluation, providing new insights into multiple existing forecasting models and LLMs across various evaluation tasks.

Liked Liked