Learning with the Nash-Sutcliffe loss
The Nash-Sutcliffe efficiency ($text{NSE}$) is a widely used, positively oriented relative measure for evaluating forecasts across multiple time series. However, it lacks a decision-theoretic foundation for this purpose. To address this, we examine its negatively oriented counterpart, which we refer to as Nash-Sutcliffe loss, defined as $L_{text{NS}} = 1 – text{NSE}$. We prove that $L_{text{NS}}$ is strictly consistent for an elicitable and identifiable multi-dimensional functional, which we name the Nash-Sutcliffe functional. This functional is a data-weighted component-wise mean. The common practice of maximizing the average NSE across multiple series is the sample analog of minimizing the expected $L_{text{NS}}$. Consequently, this operation implicitly assumes that all series originate from a single non-stationary, stochastic process. We introduce Nash-Sutcliffe linear regression, a multi-dimensional model estimated by minimizing the average $L_{text{NS}}$, which reduces to a data-weighted least squares formulation. By reorienting the sample average loss function, we extend the previously proposed evaluation and estimation framework to forecasting multiple stationary dependent time series with differing stochastic properties. This constitutes a more natural empirical implementation of the $text{NSE}$ than the earlier formulation. Our results establish a decision-theoretic foundation for $text{NSE}$-based model estimation and forecast evaluation in large datasets, while further clarifying the benefits of global over local machine learning models.