Stratified Fréchet Distance: A Stratified Evaluation Framework for Conditional Time Series Generation Models

The Fréchet Inception Distance (FID), the standard metric for evaluating deep generative models, aggregates all data into a single score and thereby masks quality degradation in safety-critical minority conditions and in specific temporal regions of generated time series. We trace this dilution problem to a single cause—the absence of stratification—and propose Stratified Fréchet Distance (SFD), which partitions evaluation data into strata along a chosen axis and computes the Fréchet distance within each stratum. The choice of axis determines the diagnosis: stratifying by operating condition detects minority-condition failures (generalizing the existing Conditional FID), by temporal segment localizes late-cycle quality breakdown, and by their cross-product yields a two-dimensional condition×time quality map. Comparing SFD at different granularities further enables quantitative detection of inter-condition confounding. Experiments on four battery datasets (161 cells) with CVAE models show that SFD detects condition-dependent quality gaps of 1.97× where FID registers only 1.01×, with up to 79× higher sensitivity for minority conditions. Condition×time stratification reveals that the largest gap (8.69×) occurs in the latter half of 35∘C degradation curves—a physically interpretable failure to reproduce accelerated high-temperature degradation. Granularity comparison further detects temperature–C-rate (charge/discharge rate) confounding (T/J = 1.72×), providing actionable guidance on which conditioning variables a generative model should include. These findings are robust across three feature extractors and four datasets.

Liked Liked