Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NSE - revisited #814

Open
4 tasks
nikeethr opened this issue Jan 28, 2025 · 15 comments
Open
4 tasks

NSE - revisited #814

nikeethr opened this issue Jan 28, 2025 · 15 comments
Assignees
Labels
new metric Adding a new metric or score

Comments

@nikeethr
Copy link
Collaborator

nikeethr commented Jan 28, 2025

#217 is stale so I'm opening a new issue to document this metric. I will utilise the effort in #217 where possible to test and document the metric.

Metric description (theory/refresher)

NSE: Nash–Sutcliffe model efficiency coefficient.

Used for predictive skill assessment in hydrological models.

The basic metric is quite straightforward, see: NSE - wiki definition

The formula can be re-arranged to be written in terms of MSE:

$$ \begin{align*} &NSE = 1 - \frac{MSE}{{{\sigma}}_o^2} \\ &\text{where }{{\sigma}}_o^2 =\text{ biased obs variance} \end{align*} $$

For consistency we would like to inherit traits that MSE implementation has, such as the ability to gather and apply the metric (or reduce) on specific dimensions, and any other features that may be compatible such as weighting (need to check).

As such, when we apply dimensionality to the problem the result will not be a scalar like the above equation. Instead we are computing $$NSE_i$$ where $$i$$ can be thought of an N dimensional indexer representing extents of the remaining dimensions.

Note
dimensionality handling is the case for most metrics in scores and usually goes without saying - I'm being explicit in this particular issue as a reminder, since its my first in a while.

Furthermore, each index (of the remaining dimensions) is independently computed, therefore, $$MSE_i$$ and $${\sigma_o}_i^2$$ can also be computed separately and then broadcast into the final NSE nd-array. So, we can use the existing implementation of MSE, only needing to compute the biased obs variance. np.var with ddof=0 (default) set explicitly. The caveat being, if some aggregation options (e.g. weighting) are non-separable (need to check), then we may need to compute everything explicitly.

Additional Notes
There are some variants of this metrics that may be beneficial to add

  • NNSE: a inverted relation more suitable for machine learning, as NSE is unbounded (lower) and cannot be used directly. essentially its 1 / (1 + MSE / sigma_obs) (note: did it in my head in a few seconds - may not be accurate)
  • NSE1: l1-norm version of NSE (l2-norm) - parallel of MAE - though the same simplification as above may not be possible in the case of absolute value - and we may not be able to re-use MAE (need to check). Good for reducing the impact of extremes.
  • LNSE: logarithmic NSE, good for weighting/optimizing over lower values.

🚧 TODO

  • make PR for this with initial commit (further details will be tracked in PR)
  • integrate stuff from (stale) add NSE function #217 where relevant.
  • make new issue(s) for above variants - so that they are tracked in a separate PR (and don't block the basic version)
  • Target version???

/cc @rob-taggart @durgals @tennlee

@nikeethr nikeethr added the new metric Adding a new metric or score label Jan 28, 2025
@nikeethr nikeethr self-assigned this Jan 28, 2025
@nikeethr
Copy link
Collaborator Author

nikeethr commented Jan 30, 2025

angular difference

MSE uses "angular" difference for certain computations which are in a radial coordinate system.

For computation of variance, which requires the mean observation, we probably need to apply the same or similar correction, so that the mean is in the range of [0, 360). I'm not confident that it is necessary (or if any hydrological metrics will require it), but at the very least it protects against data sources that use inconsistent ranges e.g. (-180, 180] or (-360, 0].

@nikeethr
Copy link
Collaborator Author

nikeethr commented Jan 30, 2025

weighting

Weighting in scores will be following the MSE example, since NSE is an adaptation of it.

  • Weighting is done on the squared error for each element
  • When computing the variance however, we have two options:
    1. ignore weighting (which will bias normalization towards the unweighted population variance)
    2. a similar weighting strategy, i.e. precompute difference from mean and weight it prior to summing

Alternatively we can disallow weighting for this metric altogether if its not well defined (OR compute anyway, and throw warning as "experimental" or "risky").

Note that if we are doing weighting on the variance, the question then is if mean_obs needs to be weighted?

For the weighted difference for MSE, we have $$\sum_{i}{(w_i.O_i - w_i . F_i)^2}$$, since $$w_i$$ can be factored out, this can be trivially simplified to $$\sum_{i}{w_i^2 . error^2}$$. However, in scores $$w_i^2$$ IS the weight, and should in theory average to unit norm, so it doesn't actually matter too much if we ignore the fact that the obs are actually weighted on the $$\sqrt{\text{weight}}$$, where weight is the provided array to the function, the unit norm condition still holds. EDIT: it may not hold

This normalisation is not quite as trivial for the obs variance: $$\sum_{j}[w_j.O_j - \sum_{i}{\frac{w_i.O_i}{N}}]^2$$. Noting that, here, $$w_i$$ can no longer be trivially factored out without making some sort of approximation. Since, $w_i = \sqrt{\text{weight}}$, as long as "weight" doesn't use some weird negative weighting scheme. If we were to be consistent with mse, we will also have to be consistent by taking the square root of the weights when computing the weighted variance, if we choose to use weighting.

I probably need someone with more subject matter expertise to chip in on it. Maybe @durgals? Should we:

  • A) avoid weighting for this score.
  • B) use weighting but not weight the variance calculation.
  • C) use weighting but sqrt it for variance, and pull it inside the square logic, so that the weights have consistent impact relative to the MSE computation.

@tennlee @nicholasloveday if you are aware of anyone else who can comment on this that would be good. I probably need to know what the broader implication of "weight" is. Is it just a fuzzy weight for each error metric, or are we explicitly weighting data directly?

@rob-taggart
Copy link
Collaborator

rob-taggart commented Jan 30, 2025

(Caveat: I have never used NSE in practice (I am not a hydrologist) but from the perspective of mathematical structure, here is my take on what a weighted NSE would look like. I can't comment on whether it would be useful.)

The way that weights have been applied in the scores metrics of which I am aware work as follows:

  • each forecast case is scored against an observation using a scoring function (e.g. squared loss)
  • you can aggregate the scores across all forecast cases by computing the mean score
  • if you want to weight results for some forecast cases higher than others, you compute a weight mean of the scores

This works with MSE (where squared loss is the underlying scoring function), MSE (where absolute loss is the underlying scoring function), etc.

NSE is different in that it is not a mean of scores, but essentially a skill score. Skill scores compute the mean score and compare them to the mean score of a reference forecast. In this case, the reference 'forecast' is the mean observation of the verification dataset. In principle you could compute the skill score using a weighted mean score rather than a mean score.

In this paradigm, the NSE is a special case of a skill for using squared loss as the underlying scoring function. If we were to apply weights to NSE in the same paradigm that you would for a weighted MSE skill score, the formula would change from

$$NSE = 1 - \sum_i (y_i - x_i)^2 / \sum_i (y_i - \bar{y})^2$$

(where $x_i$ is the forecast for the $i$ th timestep, $y_i$ the corresponding observation and $\bar{y}$ the mean observation), to

$$NSE = 1 - \sum_i w_i(y_i - x_i)^2 / \sum_i w_i(y_i - \bar{y})^2 .$$

Then you would need to generalise this to multidimensional data (e.g. computing a weighted NSE for multiple river gauges indexed by $j$) to get

$$NSE_j = 1 - \sum_i w_{ij}(y_i - x_i)^2 / \sum_i w_{ij}(y_i - \bar{y})^2 .$$

Again, not sure whether this would be used in practice. For example, if the timeseries data were not at regular intervals (such as hourly) then you could down-weight timesteps where they are more dense using lower values for $w_i$ in such instances.

@nikeethr
Copy link
Collaborator Author

nikeethr commented Jan 30, 2025

Thanks @rob-taggart that was my first thought as well, just looking at it conceptually (i.e. essentially weighted square error against forecast over weighted square error against mean) - but, wasn't sure if it was directly applicable to the variance as well in case it was a convenient extraction and the weights are actually for each datum. If what you say that this is the paradigm that scores uses - then it seems consistent enough... and a reasonable approximation at worst.

Hopefully, we can get additional clarification from a hydrologist, before I decide what to do with the weights. For now I can just default to the behaviour you suggest and address it in the PR when its getting reviewed.

Noting that as you say - we're not doing a weighting over a mean of scores - and this sort of weighting (while conceptually can make sense for the NSE equation), it may or may not be interpretable depending on the context.


An alternate flow of reasoning to the above suggestion:

Essentially a weighted score (if I'm correctly interpreting) is a transformation like this:

$$ \mu_{\text{score}} = \displaystyle\sum_{i} \frac{\text{score}_i}{N} \quad|\text{ special case of $w_i = \displaystyle\frac{1}{N}$ (see below)} $$

into

$$ \mu_{\text{weighted}} = \sum_{i} w_i . \text{score}_i \quad|\text{ generally, $\lVert\vec{w}\rVert = \mathrm{sum_i}(w_i) = 1$ is the assumption} $$

Where score_i, for example, is the squared loss for the i'th (obs, fcst) pair in the set

We can then represent weighted-NSE in a similar format (note I skipped some steps for brevity, so may need double checking from @rob-taggart):

$$ \begin{align*} \text{NSE} &= 1 - \text{error}^2 / \sigma_o^2 \\ &= {\frac{\displaystyle\sum_i 1 - \text{error}_i^2 / {\sigma}_o^2 }{N}} \end{align*} $$

now the pattern more resembles an averaging of a score and it can be transformed into a weighted form:

$$ \text{NSE}_{\text{weighted}} = \displaystyle{\frac{\displaystyle\sum_i w_i . \left[ 1 - \text{error}_i^2 / {\sigma}_o^2 \right]}{\lVert \vec{w} \rVert}} $$

where,

  • $$\sigma_o^2$$ = obs variance (population/biased)
  • $$\text{error}_i^2$$ = squared error between obs and sim
  • $$w_i$$ = weighting on score

assuming $$\lVert \vec{w} \rVert = \sum_{i} w_i = 1$$ and can be ignored, and noting that the obs variance is actually not dependent on the weighting in this representation, we can re-write the error in the form above as a weighted score - i.e. $$(\sum_{i} w_i . \text{score}_i)$$:

$$\begin{align*} &\displaystyle\sum_{i=0}^{N-1} w_i . \left[ \frac{{\sigma}_o^2 - \text{error}_i^2}{{\sigma}_o^2} \right] \\\ \implies &\text{score}_i = ({\sigma}_o^2 - \text{error}_i^2) / {\sigma}_o^2 \\\ &\text{OR} \\\ &\text{score}_i = 1 - \text{error}_i^2 / {\sigma}_o^2 \end{align*}$$

which can be interpreted as the weighted ratio of the difference between the squared loss of any given element from the obs variance, and the obs variance itself.

  • A positive difference implies that the error for a particular simulation is small compared to the obs variance
  • 0 implies the prediction loss is no better than the mean,
  • and anything negative means that particular prediction is worse than the mean.

This seems to be consistent with the aggregated score. In practice the computation is equivalent to (psuedocode) 1 - scores.mse(obs, fcst, keepdims=..., weights=..., ...) / np.var(obs, keepdims=..., ...), but for readability it may be worth just constructing the code to explicitly match the format above i.e. $$\text{score}_i$$, instead of using scores.mse.

Note: this, I suspect, is slightly different to @rob-taggart's one above because the weighting isn't applied to the variance computation in my version but I've tried to keep close to the paradigm of a "weighted score" - assuming of course my interpretation of weighted scoring is accurate - which may not be the case.

@nicholasloveday
Copy link
Collaborator

angular difference

MSE uses "angular" difference for certain computations which are in a radial coordinate system.

For computation of variance, which requires the mean observation, we probably need to apply the same or similar correction, so that the mean is in the range of [0, 360). I'm not confident that it is necessary (or if any hydrological metrics will require it), but at the very least it protects against data sources that use inconsistent ranges e.g. (-180, 180] or (-360, 0].

Jumping in on this since @rob-taggart addressed the weighting question.

I am comfortable with NSE not supporting angular data unless @durgals says that NSE is used to evaluate directional forecasts in hydrology.

Also happy if you support it regardless.

@nikeethr
Copy link
Collaborator Author

nikeethr commented Jan 30, 2025

I guess discharge or flow has a concept of direction if it is a “velocity” measure.

However, wiki defines it as quantity of a fluid e.g. volume of water on average that goes through a cross sectional area in a unit time.

“cross-sectional” area could imply that the flow is averaged in relation to a plane and therefore any directionality is, I assume, fixed by the sensor that enforces this - for example a tap. Hence, while the velocity maybe directional, it’s actually only the component perpendicular to the cross section that is recorded - though it may still be positive or negative, as opposed to radial. Given this, we probably don’t need an angular difference.

I’m just testing my interpretation and reasoning - happy to be corrected, or if there are actually sensors that take into account direction and used with NSE.

@nikeethr
Copy link
Collaborator Author

nikeethr commented Jan 30, 2025

Having said all that I think NSE in the end is a score that is quite generic - while it may be rooted or popular hydrological analysis specifically, there’s no hard rule that it cannot be used widely. As long as a UserWarning is specified that it isn’t a typical use case for the metric.

I guess the question then is is this even useful, I feel that (1 - MSE / var) is fairly generic.

0 => error matches obs variance => it’s no better than using the mean

1 (upper bound) => no error

<0 is a rejection criteria

0 to 1 is application specific probably

It is also the “reciprocal” of signal to noise ratio used in signal processing.

Given that I think we should make a decision. Should we support:

  • angular difference (Y/N)
  • weighting (Y/N)

If a hydrologist says those options are not useful (N) but a different subject matter expert finds it useful to have (Y) - I think we can still include them with a user warning that it’s not a typical use case - but @tennlee will have to give a okay as well from a software perspective. While I am a subject matter expert in signal processing, I will abstain from this particular decision since I’m probably biased.

For now I’ll implement assuming we don’t need those features unless someone tells me otherwise

@rob-taggart
Copy link
Collaborator

rob-taggart commented Jan 30, 2025

Yes, NSE is quite generic. It is simply the MSE skill score where the reference forecast* is the sample mean of the observations.

I don't see it being specific to hydrology or stream flow at all, except for its popular usage there.

(*) technically this is not a forecast, because one cannot know the sample mean of the observations ahead of time.

@rob-taggart
Copy link
Collaborator

rob-taggart commented Jan 30, 2025

Also I wonder whether the implementation should be within the general framework of skill scores:

skill_score = 1 - mean_score_fcst / mean_score_ref

Implement this first and then NSE implementation would be something like

mean_score_fcst = mse(fcst, obs)
mean_score_ref = mse(obs.mean(), obs)
nse = skill_score(mean_score_fcst, mean_score_ref)

Handling of weights would be through weight implementation already available within mse. Similarly with "angular" difference, if we wanted to make that an option.

@durgals
Copy link
Contributor

durgals commented Jan 30, 2025

In hydrology, we often assign weights to emphasize certain parts of the hydrograph. For instance, if we want to focus on high-flow periods, one approach is to use a weight function proportional to the flow. The formula provided by @rob-taggart is commonly used in hydrology (Hundecha and Ba´rdossy, 2004).

Angular difference is irrelevant in discharge measurement because river flow always moves downhill, following the terrain. Unlike wind or ocean currents, which change direction, river discharge is guided by gravity and the channel shape, making direction obvious.

Ref:
Hundecha, Y., & Bárdossy, A. (2004). Modeling of the effect of land use changes on the runoff generation of a river basin through parameter regionalization of a watershed model. Journal of Hydrology, 292(1-4), 281-295. doi:10.1016/j.jhydrol.2004.01.002

@nikeethr
Copy link
Collaborator Author

nikeethr commented Jan 30, 2025

#814 (comment) this is also a valid argument, that it is a skill score of a ratio of weighted scores - which incidentally makes the implementation much more trivial. And seems @durgals has verified it above so that settles that while I was typing this - I'll look into the paper and verify.

I am happy to leave the angular difference in (as opposed to forcing it to angular=False, but leaving False as the default), given the general purpose nature of the score, even if it isn't necessarily useful in the case of hydro-metrics in particular.

@rob-taggart
Copy link
Collaborator

@nikeethr , the general approach outlined in #814 (comment) would need to be augmented by specifying a time dimension. As I understand it, NSE is computed using time series forecasts and observations for each location. Can you confirm whether this is correct @durgals ?

If that is correct, an argument to NSE would be time_dim and the implementation (via skill scores) would better be represented as:

mean_score_fcst = mse(fcst, obs, reduce_dims=time_dim)
mean_score_ref = mse(obs.mean(time_dim), obs)
nse = skill_score(mean_score_fcst, mean_score_ref)

@nikeethr
Copy link
Collaborator Author

nikeethr commented Jan 30, 2025

@rob-taggart you're most likely right. The aggregation is done over time. (and having an option to reduce/preserve other dimensions if this is even a thing...?).

@durgals
Copy link
Contributor

durgals commented Jan 30, 2025

@nikeethr , the general approach outlined in #814 (comment) would need to be augmented by specifying a time dimension. As I understand it, NSE is computed using time series forecasts and observations for each location. Can you confirm whether this is correct @durgals ?

Yes @rob-taggart. That's why the length of the weight should be equal to the length of the time series.

@durgals
Copy link
Contributor

durgals commented Jan 30, 2025

I like the angular difference implementation with default=False.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new metric Adding a new metric or score
Projects
None yet
Development

No branches or pull requests

4 participants