-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NSE - revisited #814
Comments
angular difference MSE uses "angular" difference for certain computations which are in a radial coordinate system. For computation of variance, which requires the mean observation, we probably need to apply the same or similar correction, so that the mean is in the range of |
weighting Weighting in scores will be following the MSE example, since NSE is an adaptation of it.
Alternatively we can disallow weighting for this metric altogether if its not well defined (OR compute anyway, and throw warning as "experimental" or "risky"). Note that if we are doing weighting on the variance, the question then is if For the weighted difference for MSE, we have This normalisation is not quite as trivial for the obs variance: I probably need someone with more subject matter expertise to chip in on it. Maybe @durgals? Should we:
@tennlee @nicholasloveday if you are aware of anyone else who can comment on this that would be good. I probably need to know what the broader implication of "weight" is. Is it just a fuzzy weight for each error metric, or are we explicitly weighting data directly? |
(Caveat: I have never used NSE in practice (I am not a hydrologist) but from the perspective of mathematical structure, here is my take on what a weighted NSE would look like. I can't comment on whether it would be useful.) The way that weights have been applied in the
This works with MSE (where squared loss is the underlying scoring function), MSE (where absolute loss is the underlying scoring function), etc. NSE is different in that it is not a mean of scores, but essentially a skill score. Skill scores compute the mean score and compare them to the mean score of a reference forecast. In this case, the reference 'forecast' is the mean observation of the verification dataset. In principle you could compute the skill score using a weighted mean score rather than a mean score. In this paradigm, the NSE is a special case of a skill for using squared loss as the underlying scoring function. If we were to apply weights to NSE in the same paradigm that you would for a weighted MSE skill score, the formula would change from (where Then you would need to generalise this to multidimensional data (e.g. computing a weighted NSE for multiple river gauges indexed by Again, not sure whether this would be used in practice. For example, if the timeseries data were not at regular intervals (such as hourly) then you could down-weight timesteps where they are more dense using lower values for |
Thanks @rob-taggart that was my first thought as well, just looking at it conceptually (i.e. essentially weighted square error against forecast over weighted square error against mean) - but, wasn't sure if it was directly applicable to the variance as well in case it was a convenient extraction and the weights are actually for each datum. If what you say that this is the paradigm that scores uses - then it seems consistent enough... and a reasonable approximation at worst. Hopefully, we can get additional clarification from a hydrologist, before I decide what to do with the weights. For now I can just default to the behaviour you suggest and address it in the PR when its getting reviewed. Noting that as you say - we're not doing a weighting over a mean of scores - and this sort of weighting (while conceptually can make sense for the NSE equation), it may or may not be interpretable depending on the context. An alternate flow of reasoning to the above suggestion: Essentially a weighted score (if I'm correctly interpreting) is a transformation like this: into Where We can then represent weighted-NSE in a similar format (note I skipped some steps for brevity, so may need double checking from @rob-taggart): now the pattern more resembles an averaging of a score and it can be transformed into a weighted form: where,
assuming which can be interpreted as the weighted ratio of the difference between the squared loss of any given element from the obs variance, and the obs variance itself.
This seems to be consistent with the aggregated score. In practice the computation is equivalent to (psuedocode)
|
Jumping in on this since @rob-taggart addressed the weighting question. I am comfortable with NSE not supporting angular data unless @durgals says that NSE is used to evaluate directional forecasts in hydrology. Also happy if you support it regardless. |
I guess discharge or flow has a concept of direction if it is a “velocity” measure. However, wiki defines it as quantity of a fluid e.g. volume of water on average that goes through a cross sectional area in a unit time. “cross-sectional” area could imply that the flow is averaged in relation to a plane and therefore any directionality is, I assume, fixed by the sensor that enforces this - for example a tap. Hence, while the velocity maybe directional, it’s actually only the component perpendicular to the cross section that is recorded - though it may still be positive or negative, as opposed to radial. Given this, we probably don’t need an angular difference. I’m just testing my interpretation and reasoning - happy to be corrected, or if there are actually sensors that take into account direction and used with NSE. |
Having said all that I think NSE in the end is a score that is quite generic - while it may be rooted or popular hydrological analysis specifically, there’s no hard rule that it cannot be used widely. As long as a UserWarning is specified that it isn’t a typical use case for the metric. I guess the question then is is this even useful, I feel that (1 - MSE / var) is fairly generic. 0 => error matches obs variance => it’s no better than using the mean 1 (upper bound) => no error <0 is a rejection criteria 0 to 1 is application specific probably It is also the “reciprocal” of signal to noise ratio used in signal processing. Given that I think we should make a decision. Should we support:
If a hydrologist says those options are not useful (N) but a different subject matter expert finds it useful to have (Y) - I think we can still include them with a user warning that it’s not a typical use case - but @tennlee will have to give a okay as well from a software perspective. While I am a subject matter expert in signal processing, I will abstain from this particular decision since I’m probably biased. For now I’ll implement assuming we don’t need those features unless someone tells me otherwise |
Yes, NSE is quite generic. It is simply the MSE skill score where the reference forecast* is the sample mean of the observations. I don't see it being specific to hydrology or stream flow at all, except for its popular usage there. (*) technically this is not a forecast, because one cannot know the sample mean of the observations ahead of time. |
Also I wonder whether the implementation should be within the general framework of skill scores: skill_score = 1 - mean_score_fcst / mean_score_ref Implement this first and then NSE implementation would be something like mean_score_fcst = mse(fcst, obs) Handling of weights would be through weight implementation already available within mse. Similarly with "angular" difference, if we wanted to make that an option. |
In hydrology, we often assign weights to emphasize certain parts of the hydrograph. For instance, if we want to focus on high-flow periods, one approach is to use a weight function proportional to the flow. The formula provided by @rob-taggart is commonly used in hydrology (Hundecha and Ba´rdossy, 2004). Angular difference is irrelevant in discharge measurement because river flow always moves downhill, following the terrain. Unlike wind or ocean currents, which change direction, river discharge is guided by gravity and the channel shape, making direction obvious. Ref: |
#814 (comment) this is also a valid argument, that it is a skill score of a ratio of weighted scores - which incidentally makes the implementation much more trivial. And seems @durgals has verified it above so that settles that while I was typing this - I'll look into the paper and verify. I am happy to leave the angular difference in (as opposed to forcing it to angular=False, but leaving False as the default), given the general purpose nature of the score, even if it isn't necessarily useful in the case of hydro-metrics in particular. |
@nikeethr , the general approach outlined in #814 (comment) would need to be augmented by specifying a time dimension. As I understand it, NSE is computed using time series forecasts and observations for each location. Can you confirm whether this is correct @durgals ? If that is correct, an argument to NSE would be
|
@rob-taggart you're most likely right. The aggregation is done over time. (and having an option to reduce/preserve other dimensions if this is even a thing...?). |
Yes @rob-taggart. That's why the length of the weight should be equal to the length of the time series. |
I like the angular difference implementation with default=False. |
#217 is stale so I'm opening a new issue to document this metric. I will utilise the effort in #217 where possible to test and document the metric.
Metric description (theory/refresher)
NSE: Nash–Sutcliffe model efficiency coefficient.
Used for predictive skill assessment in hydrological models.
The basic metric is quite straightforward, see: NSE - wiki definition
The formula can be re-arranged to be written in terms of MSE:
For consistency we would like to inherit traits that MSE implementation has, such as the ability to gather and apply the metric (or reduce) on specific dimensions, and any other features that may be compatible such as weighting (need to check).
As such, when we apply dimensionality to the problem the result will not be a scalar like the above equation. Instead we are computing$$NSE_i$$ where $$i$$ can be thought of an N dimensional indexer representing extents of the remaining dimensions.
Furthermore, each index (of the remaining dimensions) is independently computed, therefore,$$MSE_i$$ and $${\sigma_o}_i^2$$ can also be computed separately and then broadcast into the final NSE nd-array. So, we can use the existing implementation of MSE, only needing to compute the biased obs variance.
np.var
withddof=0
(default) set explicitly. The caveat being, if some aggregation options (e.g. weighting) are non-separable (need to check), then we may need to compute everything explicitly.Additional Notes
There are some variants of this metrics that may be beneficial to add
1 / (1 + MSE / sigma_obs)
(note: did it in my head in a few seconds - may not be accurate)🚧 TODO
/cc @rob-taggart @durgals @tennlee
The text was updated successfully, but these errors were encountered: