-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As a developer, I want to identify and implement a solution for event-based evaluations #130
Comments
Original Redmine Comment Going high with my estimated time. Will wait to see how James adjusts it, Hank |
Original Redmine Comment This is another ticket where it depends hugely on the starting assumptions. If I assume that the method for event detection (defined as some function that consumes a time-series and returns a set of non-overlapping datetime intervals that correspond to "events") is one of:
Then my estimate for a first cut is probably closer to "64", but let's say "128". Conversely, if we're starting from scratch, reviewing and perhaps even inventing, then my estimate is much higher, probably 2^10=1024. |
Original Redmine Comment Oh, and I would like to add "no forecasts" to the starting assumption above. I'm not sure how event detection is done for forecasts, which will quite often not capture an event in its entirety. I haven't looked at either of the above two methods in detail, but I assume neither admit forecasts. |
Original Redmine Comment I'll assume the simple solution, for now, and lower the estimate. I would think we would want to build off of either the ISED or NWM algorithm instead of starting from scratch. Jay emphasized that the schedule can always be adjust after the reauthorization if its determined that an estimate was off. Hank |
Marking this as nominally started, but will probably begin with a conversation in the original user support ticket over on VLab. |
Kicked off a conversation in the relevant ticket. Will wait to see if anyone responds. |
No responses so far. I do wonder whether anyone really wants this feature. For that reason, I lean towards something quite minimal. I will follow-up with Jason R. and ask about the method they use in hydrotools for event detection. |
Sent a follow-up to Jason R. |
Going to meet with Jason R to discuss further. |
Met with Jason and Hank yesterday to discuss further. There are probably two separate tracks here:
There could be a third category in future:
|
Regarding the event detection method, it is likely to be a bit hand-wavey in the sense that it must be an automated scheme because the WRES is not an interactive program, and some parameters are likely to be needed (with defaults). Regardless, the presence/choice of parameters introduces a hand-wavey element and, depending on the chosen technique, there may be some fundamental limitations in some circumstances. Some discussion was had about the extent to which discrete frequencies could be identified and modeled, like a low-frequency trend or periodic signal (e.g., seasonal), a high-frequency periodic signal (e.g., diurnal), a high-frequency non-periodic signal and a high-frequency non-structural or noise signal. The following paper adopts such an approach, modeling the overall signal as a sum of parts: Regina, J.A., F.L. Ogden, 2021. Automated Correction of Systematic Errors in High-Frequency Depth Measurements above V-Notch Weirs using Time Series Decomposition. Hydrol. Proc. With code: Naturally, there will be some overlap between these parts and some modeling challenges in some circumstances. For example, in a hydrologic context, the extent to which rainfall-driven runoff (high-frequency non-periodic) could be discriminated from periodic diurnal variations (high-frequency periodic) may be complicated when rainfall itself is partly diurnal (e.g., evening convection). Nevertheless, this and most other techniques can only provide an approximate discrimination of event start and end dates and they should not be regarded as precise. In some cases "what is an event?" doesn't have a clear answer. For example, in snow basins, the melt period may be hard to distinguish as a discrete event, and the initial focus would be firmly on events within the high-frequency non-periodic signal, again roughly corresponding to rainfall-driven runoff in hydrologic terms. Furthermore, it is likely that some post-processing parameters may be needed to filter out events that are "too short" or "not big enough" (e.g., not above a threshold). Again, hand-wavey parameters, but they are likely to be unavoidable. To the extent that suitable methods exist that address different types of event discrimination, it should be possible to support a method designation/declaration with up to N methods of event discrimination supported, and a default imposed in the absence of a method selection. |
Some further discussion was had about how to model/separate baseflows from the runoff signal and, for example, it was suggested that a digital recursive filter may be adequate in many circumstances, along the lines of: Eckhardt, K. (2005). How to construct recursive digital filters for baseflow separation. Hydrological Processes, 19(2), 507–515. https://doi.org/10.1002/hyp.5675 In general, such filters assume a linear reservoir model and are, therefore, likely to work well in circumstances where the underlying flow generating mechanisms are approximately linear. In short, regardless of the approach taken to discriminating "events" - these being high-frequency non-periodic variations roughly corresponding to rainfall-driven runoff in hydrologic terms - the actual event periods will be substantially approximate and the techniques and parameters applicable in particular circumstances and requiring some experimentation and guidance for users, caveat emptor etc. |
@james-d-brown These comments seem like a fair representation of the intricacies and pitfalls of event-based analysis. I'd add some additional notes for consideration.
It's true that event detection and baseflow separation techniques tend not to work well (or at all) on short time series. This is because you need a signal long enough to characterize the trend you're trying to remove. From this perspective, "events" are really a characteristic of the time series as a whole ("bumps" are relative). With regard to forecasts, I've wondered if event detection might work on lead time pools / time-lagged ensembles. You could generate a long single trace by taking the mean (or other statistic) across the same lead time pools of several forecasts. This could exacerbate the "time series component" identifiability problem if stringing together lead time pools results in more "bumps" (as it would if the forecast system uses anything like The Nudge™). Nonetheless, it is a way to get a forecast trace long enough to model a trend.
We've still yet to conduct a sensitivity analysis of event detection. I've heard anecdotally from users that some parameters don't seem to really change the identified events. It might be interesting to try and quantify how hand-wavey we can get.
I'd note here that we chose the additive model because this model of time series decomposition is implicitly assumed by baseflow separation techniques. That is many conceptualizations of hydrology implicitly assume discrete components of streamflow originate from discrete physical processes in a catchment each of which additively contributes a discrete volume to the total streamflow according to some conservation equation. From a purely signal and systems perspective, we could have used a multiplicative or exponential model of composition, but I don't think I've seen anything other than an additive model used in hydrology. I suppose I'm just re-emphasizing your point that the major source of uncertainty here is the appropriate number of time series "parts" and how these parts vary in time.
You may find these papers interesting. The Eckhardt filter has two parameters: recession constant and maximum baseflow index. These papers discuss automated techniques for setting these parameters.
Eckhardt, K. (2008). A comparison of baseflow indices, which were calculated with seven different baseflow separation methods. Journal of Hydrology, 352(1-2), 168-173.
Collischonn, W., & Fan, F. M. (2013). Defining parameters for Eckhardt's digital baseflow filter. Hydrological Processes, 27(18), 2614-2622.
Eckhardt, K. (2012). Analytical sensitivity analysis of a two parameter recursive digital baseflow separation filter. Hydrology and Earth System Sciences, 16(2), 451-455. |
Much appreciated, thanks, Jason! On forecasts, I agree that we should consider forecasts, probably as an extension of the initial work. For long-range forecasts, it should be mechanically possible and, even for short- or medium-range forecasts, it may be possible to look across multiple issued datetimes to build a picture of events as seen by (multiple) forecasts. Also, as you say, it should be possible to look across lead time pools, building a long time-series for a fixed lead time, repeated for all lead times. Anyway, that sounds like interesting follow-up work. |
Yes, for sure, that is part of the subjectivity of it all, when does an event start and end? There cannot really be a universally "right" answer to that because it partly depends on the question being posed. I think the best we can do is to allow a user to tweak some hand-wavey post-processing parameter, like a minimum gap between events before aggregation (into one event). That said, the way I see this ticket to begin with is to provide our users with a starting point. If it generates some interest and usage, more will be done. If it doesn't, well, more will not be done :) |
A related point in this context is the architecture. We originally discussed an event detection service. I still think that makes more sense, ultimately, especially if a user needs to play with parameters before they get the events they want, visually/subjectively speaking. That process should be independent of the actual use of the events in an evaluation, and it doesn't make much sense to run an evaluation tool as an event detection tool (without an evaluation bolted on). Separation of concerns and all that. |
Back to this. |
An early consideration is what this looks like for a user, how they would declare it. We should re-use as much of the existing declaration as possible. If a user wants to perform event detection with one of the evaluation datasets, they shouldn't need to redeclare these datasets and all associated parameters (e.g., variable names, time scale etc.) in a special event detection context. Rather, they should be able to identify the context or orientation for the datasets to be used in event detection, such as The first question that arises is: what to do when a user wants to declare a dataset (or datasets) for event detection that is not one of the main evaluation datasets, i.e. However, since there is already a default filter when no explicit |
One nice thing about the |
The other obvious benefit of re-using the existing declaration for datasets is that the dataset declaration drives an existing data reading and ingest pipeline, both of which are pre-requisites for event detection. In other words, we don't simply re-use declaration, but a large chunk of our existing infrastructure for getting time-series data into the software. |
However, once we move beyond ingest, things become more complicated and will require a significant amount of new development. Recall that the phase of our pipeline after ingest is entirely "pool-shaped". That is to say, data retrieval, upscaling, pairing etc. all takes place in the context of a pre-defined pool, repeated in parallel for each pool in the evaluation. A pool is described by a time window or temporal "event" boundaries, among other things. In other words, a pool is effectively event-shaped and we will simply re-use this concept for event-shaped evaluations based on event detection, rather than explicit declaration. This all sounds good/fine, but the obvious issue here is that we cannot start that part of our existing pipeline that is event-shaped without the events to drive them. Since we are introducing pools ("events") that are entirely data-driven, the chicken and the egg are in conflict here, i.e., we need to detect the pools before we can use them to drive our post-ingest pipeline. In short, we will need a new software infrastructure after the ingest phase and before the evaluation phase that is concerned with generating the pool boundaries or time windows for event-based evaluations, which will go on to drive the existing post-ingest pipeline. We can still re-use lots of things, like time-series retrievers and upscalers and so on, because our software is properly abstracted. However, we cannot incorporate event detection into the pool-shaped pipeline that already does all of these activities, because event detection is concerned with identifying the pools in the first place. Also, for the avoidance of doubt, this isn't a problem with our software design - it makes perfect sense for the post-ingest work to be pool-shaped. But we do need new abstractions to retrieve and upscale etc. the time-series data for event detection and then to perform the event detection itself. |
All of that to say, even the first cut at this is a big ticket in terms of integration alone, putting aside any time to develop a sensible event detection algorithm. I have copied across the estimate of 128 bananas from the Redmine ticket, which was dropped from 256 bananas, but the reality is probably somewhere in between these two estimates and the provisional deadline of mid Jan '25 may need to slip given the integration work. However, we can clarify that as things progress. |
Back to the declaration, I expect the simplest possible declaration could look something like this:
This would be a shorthand for:
In other words, conduct an evaluation using all default settings and event detection with the |
To use a different dataset for event detection, it could look something like this:
Again, the |
I think the more complex declarations flow naturally from that and there probably isn't a need to elaborate further at this stage. In other words, we can assume that |
It is perhaps worth clarifying that both the
In other words, use the covariate for both filtering and event detection and perform event detection using both the |
With the last commit, this branch should be ready to merge into main so that UAT can begin (or, rather, pre-deployment testing). |
Merge in progress, should then be ready for UAT. |
Although ready for UAT, I will continue to do some testing of this myself. There are a lot of moving parts and interactions with existing features. |
Again, the wiki is here: https://github.com/NOAA-OWP/wres/wiki/Event-detection |
Pending UAT. |
Hmm, looks like some system tests failed related to NetCDF reading. I think SQ caught a couple of issues w/r to some code there, which was touched by a class rename, and I accepted the edit without rerunning the system tests, so that is likely the culprit. Will reverse that minor change now... |
Reverse a minor change to NetCDF file reading, #130.
Done. |
Reviewing the wiki... I have a question about the example, observed: observed.csv
predicted: predicted.csv
covariates:
- sources: precipitation.csv
purpose: detect
- sources: temperature.csv
event_detection: covariates and following paragraph:
So the "Specifically, if there are no filtering parameters (i.e., no minimum or maximum), then the purpose is assumed to be detect,..." No filtering parameters are declared, so the purpose of the temperature covariate should be I guess I'm not understanding the paragraph. Please let me know where I've gone wrong. Thanks, Hank |
Need to step away for a bit, but, after returning, I hope to complete my review of the wiki before I leave a doctor's appointment. Back in 30 minutes or so, Hank |
Yes, good catch, there is meant to be a filter on temperature, just forgot to add one, will fix. |
Fixed the declaration and clarified the explanation. |
Looks good. Thanks! Continuing my review, Hank |
I'm almost hesitant to write this in case my math is screwed up, but here it goes... The defaults for If neither is defined, then, If one of the two are defined, then, when A ratio of 20 when one is defined vs. 10 when neither is defined. I'm not sure why there would be that inconsistency, and I doubt anyone would notice unless they really look carefully at that table in that section... which I did. Regardless, I'm not proposing a change, unless you think there is an error somewhere in either the code or the wiki. Thanks, Hank |
I may have overlooked it and need to go soon so I don't have time to double check... Are the events reported in the evaluation CSV? Are they reported in the logging? It may be good to let the user know how to find those events. The images appear to only provide the earliest valid time (start time of the event?), based on the x-axis label. Off to the doctor, Hank |
Yeah, we can make the ratio consistent. The defaults are a bit hand-wavey, but we can at least ensure internal consistency. So: W=10H for H declared Will adjust. |
The "events" are just time windows, reported in the same way that time windows are reported more generally, which includes the logging and the various numeric output formats, including pairs. The graphics are just a visual representation of (part of) the information and it's the same for the "event" time windows as any other ("non-event" or, rather, declared) time windows. One thing that's noted in the limitations is that it would make sense to separate out event detection from evaluation. That way, the events are more transparent to a user iterating them. I could add a sentence about the output formats recording them....? |
…ernal consistency in the relationship between the window size and half-life when one or neither is declared, #130.
Done and adjusted the wiki. Will merge in momentarily. |
Also off to a medical appointment now, will be back in an hour or two... |
Yeah, perhaps. No more than a sentence. Thanks! Hank |
The wiki looks good otherwise. I'll start my testing on Tuesday, and, as mentioned in the meeting, I'll use a separate ticket to track that testing since there will be a lot of it. Thanks, Hank |
|
Author Name: Hank (Hank)
Original Redmine Issue: 119823, https://vlab.noaa.gov/redmine/issues/119823
Original Date: 2023-08-22
See #79741 for the requirement being addressed.
This ticket can be resolved once we have designed a solution and implemented it.
Leaving as normal and in the backlog pending prioritization of the NWM team requirements for use case #115608.
Hank
The text was updated successfully, but these errors were encountered: