Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a developer, I want to identify and implement a solution for event-based evaluations #130

Closed
epag opened this issue Aug 20, 2024 · 104 comments · Fixed by #379
Closed

As a developer, I want to identify and implement a solution for event-based evaluations #130

epag opened this issue Aug 20, 2024 · 104 comments · Fixed by #379
Assignees
Labels
enhancement New feature or improvement
Milestone

Comments

@epag
Copy link
Collaborator

epag commented Aug 20, 2024


Author Name: Hank (Hank)
Original Redmine Issue: 119823, https://vlab.noaa.gov/redmine/issues/119823
Original Date: 2023-08-22


See #79741 for the requirement being addressed.

This ticket can be resolved once we have designed a solution and implemented it.

Leaving as normal and in the backlog pending prioritization of the NWM team requirements for use case #115608.

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2023-08-22T14:10:58Z


Going high with my estimated time. Will wait to see how James adjusts it,

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2023-08-22T14:30:24Z


This is another ticket where it depends hugely on the starting assumptions.

If I assume that the method for event detection (defined as some function that consumes a time-series and returns a set of non-overlapping datetime intervals that correspond to "events") is one of:

  • The method used by the NWM team (Yuqiong et al.); or
  • The method used by ISED in their hydrotools (Jason).

Then my estimate for a first cut is probably closer to "64", but let's say "128".

Conversely, if we're starting from scratch, reviewing and perhaps even inventing, then my estimate is much higher, probably 2^10=1024.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2023-08-22T14:32:30Z


Oh, and I would like to add "no forecasts" to the starting assumption above. I'm not sure how event detection is done for forecasts, which will quite often not capture an event in its entirety. I haven't looked at either of the above two methods in detail, but I assume neither admit forecasts.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2023-08-22T15:06:54Z


I'll assume the simple solution, for now, and lower the estimate. I would think we would want to build off of either the ISED or NWM algorithm instead of starting from scratch.

Jay emphasized that the schedule can always be adjust after the reauthorization if its determined that an estimate was off.

Hank

@james-d-brown james-d-brown added the enhancement New feature or improvement label Nov 4, 2024
@james-d-brown james-d-brown self-assigned this Nov 4, 2024
@james-d-brown james-d-brown added this to the v6.28 milestone Nov 4, 2024
@james-d-brown
Copy link
Collaborator

Marking this as nominally started, but will probably begin with a conversation in the original user support ticket over on VLab.

@james-d-brown
Copy link
Collaborator

Kicked off a conversation in the relevant ticket. Will wait to see if anyone responds.

@james-d-brown
Copy link
Collaborator

No responses so far. I do wonder whether anyone really wants this feature. For that reason, I lean towards something quite minimal. I will follow-up with Jason R. and ask about the method they use in hydrotools for event detection.

@james-d-brown
Copy link
Collaborator

Sent a follow-up to Jason R.

@james-d-brown
Copy link
Collaborator

Going to meet with Jason R to discuss further.

@james-d-brown
Copy link
Collaborator

james-d-brown commented Nov 14, 2024

Met with Jason and Hank yesterday to discuss further.

There are probably two separate tracks here:

  1. Use of "events" as a filter on the verification pairs. Use of arbitrary time-series data sources for event discrimination. An event in this context is a pair of datetimes, an event start and end. Statistical evaluation then proceeds as normal, either for the event periods (default) or the non-event periods (by election). In terms of interactions with other pools, event periods are defined on the valid datetime line, so valid time pools would not make sense in combination with event detection, but other time-based pooling options could be layered on top of the event pools, including lead durations and forecast issued datetimes. In principle, when multiple time-series are declared for event detection, the evaluation could proceed with either the intersection of (overlapping) events across all time-series or the union of them. For example, when declaring two time-series, one corresponding to "observations" and another to "model simulations", the intersection would represent the events that appear (overlap) in both the observations and model predictions. In summary, event detection would be akin to season filtering. Obviously, there is plenty of scope for users to declare nonsensical things, but that is very context dependent and applies to evaluation in general. When using events as filters, they can be applied to any type of predictions, including forecasts.

  2. Comparisons of event-based measures across observations and predictions (and/or a baseline) in a marginal distributional sense. In other words, event-based measures are calculated separately for the observations and predictions and, optionally, the baseline time-series. By definition, these measures are not exactly verification measures because they apply to the different datasets separately, not the paired datasets (see 3. below). In other words, there is no such thing as a "timing error" unless events are paired. But we could include measures like time-to-peak where the duration is measured from the start of the event horizon to the peak value within it. In summary, there would be a comparison of hydrologic signatures in a distributional sense (e.g., overlapping histograms). When evaluating event-based measures, we can only accommodate long time-series of predictions like simulations because event detection with forecasts is unlikely to work, in general (forecasts will frequently span partial events, certainly short-range forecasts).

There could be a third category in future:

  1. Paired event evaluation whereby the joint time-series of observations and predictions (and, optionally, observations and baseline) is examined for coincident events (or non-events) using a bivariate technique that operates in the time-domain or frequency domain, like curve registration or the cross-wavelet transform. However, this comes with several challenges, such as how to penalize "false positives" and "false negatives" in the event sense, i.e., situations where the event occurs in one or other but not both. In that case, certain measures do not exist, like a "timing error" and it is unclear how they should be penalized (or, rather, if they should be calculated for event "true positives" only and then used alongside other measures for the event false positives/negatives).

@james-d-brown
Copy link
Collaborator

james-d-brown commented Nov 14, 2024

Regarding the event detection method, it is likely to be a bit hand-wavey in the sense that it must be an automated scheme because the WRES is not an interactive program, and some parameters are likely to be needed (with defaults). Regardless, the presence/choice of parameters introduces a hand-wavey element and, depending on the chosen technique, there may be some fundamental limitations in some circumstances.

Some discussion was had about the extent to which discrete frequencies could be identified and modeled, like a low-frequency trend or periodic signal (e.g., seasonal), a high-frequency periodic signal (e.g., diurnal), a high-frequency non-periodic signal and a high-frequency non-structural or noise signal. The following paper adopts such an approach, modeling the overall signal as a sum of parts:

Regina, J.A., F.L. Ogden, 2021. Automated Correction of Systematic Errors in High-Frequency Depth Measurements above V-Notch Weirs using Time Series Decomposition. Hydrol. Proc.

With code:

https://github.com/NOAA-OWP/hydrotools/blob/main/python/events/src/hydrotools/events/event_detection/decomposition.py

Naturally, there will be some overlap between these parts and some modeling challenges in some circumstances. For example, in a hydrologic context, the extent to which rainfall-driven runoff (high-frequency non-periodic) could be discriminated from periodic diurnal variations (high-frequency periodic) may be complicated when rainfall itself is partly diurnal (e.g., evening convection). Nevertheless, this and most other techniques can only provide an approximate discrimination of event start and end dates and they should not be regarded as precise. In some cases "what is an event?" doesn't have a clear answer. For example, in snow basins, the melt period may be hard to distinguish as a discrete event, and the initial focus would be firmly on events within the high-frequency non-periodic signal, again roughly corresponding to rainfall-driven runoff in hydrologic terms.

Furthermore, it is likely that some post-processing parameters may be needed to filter out events that are "too short" or "not big enough" (e.g., not above a threshold). Again, hand-wavey parameters, but they are likely to be unavoidable.

To the extent that suitable methods exist that address different types of event discrimination, it should be possible to support a method designation/declaration with up to N methods of event discrimination supported, and a default imposed in the absence of a method selection.

@james-d-brown
Copy link
Collaborator

james-d-brown commented Nov 14, 2024

Some further discussion was had about how to model/separate baseflows from the runoff signal and, for example, it was suggested that a digital recursive filter may be adequate in many circumstances, along the lines of:

Eckhardt, K. (2005). How to construct recursive digital filters for baseflow separation. Hydrological Processes, 19(2), 507–515. https://doi.org/10.1002/hyp.5675

In general, such filters assume a linear reservoir model and are, therefore, likely to work well in circumstances where the underlying flow generating mechanisms are approximately linear.

In short, regardless of the approach taken to discriminating "events" - these being high-frequency non-periodic variations roughly corresponding to rainfall-driven runoff in hydrologic terms - the actual event periods will be substantially approximate and the techniques and parameters applicable in particular circumstances and requiring some experimentation and guidance for users, caveat emptor etc.

@jarq6c
Copy link

jarq6c commented Nov 14, 2024

@james-d-brown These comments seem like a fair representation of the intricacies and pitfalls of event-based analysis. I'd add some additional notes for consideration.

we can only accommodate long time-series of predictions like simulations because event detection with forecasts is unlikely to work

It's true that event detection and baseflow separation techniques tend not to work well (or at all) on short time series. This is because you need a signal long enough to characterize the trend you're trying to remove. From this perspective, "events" are really a characteristic of the time series as a whole ("bumps" are relative).

With regard to forecasts, I've wondered if event detection might work on lead time pools / time-lagged ensembles. You could generate a long single trace by taking the mean (or other statistic) across the same lead time pools of several forecasts. This could exacerbate the "time series component" identifiability problem if stringing together lead time pools results in more "bumps" (as it would if the forecast system uses anything like The Nudge™). Nonetheless, it is a way to get a forecast trace long enough to model a trend.

Regarding the event detection method, it is likely to be a bit hand-wavey in the sense that it must be an automated scheme because the WRES is not an interactive program, and some parameters are likely to be needed (with defaults).

We've still yet to conduct a sensitivity analysis of event detection. I've heard anecdotally from users that some parameters don't seem to really change the identified events. It might be interesting to try and quantify how hand-wavey we can get.

The following paper adopts such an approach, modeling the overall signal as a sum of parts:

I'd note here that we chose the additive model because this model of time series decomposition is implicitly assumed by baseflow separation techniques. That is many conceptualizations of hydrology implicitly assume discrete components of streamflow originate from discrete physical processes in a catchment each of which additively contributes a discrete volume to the total streamflow according to some conservation equation.

From a purely signal and systems perspective, we could have used a multiplicative or exponential model of composition, but I don't think I've seen anything other than an additive model used in hydrology. I suppose I'm just re-emphasizing your point that the major source of uncertainty here is the appropriate number of time series "parts" and how these parts vary in time.

Eckhardt, K. (2005). How to construct recursive digital filters for baseflow separation. Hydrological Processes, 19(2), 507–515. https://doi.org/10.1002/hyp.5675

You may find these papers interesting. The Eckhardt filter has two parameters: recession constant and maximum baseflow index. These papers discuss automated techniques for setting these parameters.

  1. This paper presents a method for using recession analysis to automatically determine the recession constant.

Eckhardt, K. (2008). A comparison of baseflow indices, which were calculated with seven different baseflow separation methods. Journal of Hydrology, 352(1-2), 168-173.

  1. This paper uses a reverse filtering technique to automatically determine the maximum baseflow index.

Collischonn, W., & Fan, F. M. (2013). Defining parameters for Eckhardt's digital baseflow filter. Hydrological Processes, 27(18), 2614-2622.

  1. This paper discusses sensitivity and parametric uncertainty of the filter.

Eckhardt, K. (2012). Analytical sensitivity analysis of a two parameter recursive digital baseflow separation filter. Hydrology and Earth System Sciences, 16(2), 451-455.

@james-d-brown
Copy link
Collaborator

Much appreciated, thanks, Jason!

On forecasts, I agree that we should consider forecasts, probably as an extension of the initial work. For long-range forecasts, it should be mechanically possible and, even for short- or medium-range forecasts, it may be possible to look across multiple issued datetimes to build a picture of events as seen by (multiple) forecasts. Also, as you say, it should be possible to look across lead time pools, building a long time-series for a fixed lead time, repeated for all lead times. Anyway, that sounds like interesting follow-up work.

@jarq6c
Copy link

jarq6c commented Nov 20, 2024

Another consideration I forgot about is compound events. Depending on the hydrologist, the attached image might contain 1 multi-day event or 8ish individual event peaks. I imagine most definitions would land on 1, 3, or 5 events, but that's not much clearer.

Screenshot from 2024-11-20 10-29-32

@james-d-brown
Copy link
Collaborator

Yes, for sure, that is part of the subjectivity of it all, when does an event start and end? There cannot really be a universally "right" answer to that because it partly depends on the question being posed. I think the best we can do is to allow a user to tweak some hand-wavey post-processing parameter, like a minimum gap between events before aggregation (into one event). That said, the way I see this ticket to begin with is to provide our users with a starting point. If it generates some interest and usage, more will be done. If it doesn't, well, more will not be done :)

@james-d-brown
Copy link
Collaborator

A related point in this context is the architecture. We originally discussed an event detection service. I still think that makes more sense, ultimately, especially if a user needs to play with parameters before they get the events they want, visually/subjectively speaking. That process should be independent of the actual use of the events in an evaluation, and it doesn't make much sense to run an evaluation tool as an event detection tool (without an evaluation bolted on). Separation of concerns and all that.

@james-d-brown
Copy link
Collaborator

Back to this.

@james-d-brown
Copy link
Collaborator

james-d-brown commented Nov 27, 2024

An early consideration is what this looks like for a user, how they would declare it.

We should re-use as much of the existing declaration as possible. If a user wants to perform event detection with one of the evaluation datasets, they shouldn't need to redeclare these datasets and all associated parameters (e.g., variable names, time scale etc.) in a special event detection context. Rather, they should be able to identify the context or orientation for the datasets to be used in event detection, such as observed or predicted or observed and predicted.

The first question that arises is: what to do when a user wants to declare a dataset (or datasets) for event detection that is not one of the main evaluation datasets, i.e. observed, predicted or baseline? I suppose the obvious answer to this is that they declare the datasets with a covariate context. Afterall, any dataset that is not an evaluation dataset but is used to support the evaluation in some way is a covariate. Until now, this has meant filtering the pairs, but we can broaden the scope.

However, since there is already a default filter when no explicit minimum or maximum covariate value is declared, namely that the pairs will only be formed when the covariate contains an event value, we will need a modifier for event detection. This is a bit annoying, but not a big deal. Now that there are N purposes (N=2) rather than one purpose for a covariate, it follows that a user will need to clarify the intended purpose or purposes of the covariate. Naturally, to avoid a breaking change, the default purpose will be filtering, not event detection. Otherwise, we could allow a list of purposes, beginning with filtering and detection.

@james-d-brown
Copy link
Collaborator

One nice thing about the covariate context is that a user can potentially declare entirely different variables for event detection than the variables used for evaluation. Whether this makes sense is entirely context dependent, but it is flexible and desirable.

@james-d-brown
Copy link
Collaborator

The other obvious benefit of re-using the existing declaration for datasets is that the dataset declaration drives an existing data reading and ingest pipeline, both of which are pre-requisites for event detection. In other words, we don't simply re-use declaration, but a large chunk of our existing infrastructure for getting time-series data into the software.

@james-d-brown
Copy link
Collaborator

However, once we move beyond ingest, things become more complicated and will require a significant amount of new development.

Recall that the phase of our pipeline after ingest is entirely "pool-shaped". That is to say, data retrieval, upscaling, pairing etc. all takes place in the context of a pre-defined pool, repeated in parallel for each pool in the evaluation. A pool is described by a time window or temporal "event" boundaries, among other things. In other words, a pool is effectively event-shaped and we will simply re-use this concept for event-shaped evaluations based on event detection, rather than explicit declaration.

This all sounds good/fine, but the obvious issue here is that we cannot start that part of our existing pipeline that is event-shaped without the events to drive them. Since we are introducing pools ("events") that are entirely data-driven, the chicken and the egg are in conflict here, i.e., we need to detect the pools before we can use them to drive our post-ingest pipeline.

In short, we will need a new software infrastructure after the ingest phase and before the evaluation phase that is concerned with generating the pool boundaries or time windows for event-based evaluations, which will go on to drive the existing post-ingest pipeline.

We can still re-use lots of things, like time-series retrievers and upscalers and so on, because our software is properly abstracted. However, we cannot incorporate event detection into the pool-shaped pipeline that already does all of these activities, because event detection is concerned with identifying the pools in the first place. Also, for the avoidance of doubt, this isn't a problem with our software design - it makes perfect sense for the post-ingest work to be pool-shaped. But we do need new abstractions to retrieve and upscale etc. the time-series data for event detection and then to perform the event detection itself.

@james-d-brown
Copy link
Collaborator

All of that to say, even the first cut at this is a big ticket in terms of integration alone, putting aside any time to develop a sensible event detection algorithm. I have copied across the estimate of 128 bananas from the Redmine ticket, which was dropped from 256 bananas, but the reality is probably somewhere in between these two estimates and the provisional deadline of mid Jan '25 may need to slip given the integration work. However, we can clarify that as things progress.

@james-d-brown
Copy link
Collaborator

james-d-brown commented Nov 27, 2024

Back to the declaration, I expect the simplest possible declaration could look something like this:

observed: observations.csv
predicted: predictions.csv
event_detection: observed

This would be a shorthand for:

observed: observations.csv
predicted: predictions.csv
event_detection: 
  dataset: observed

In other words, conduct an evaluation using all default settings and event detection with the observed dataset (and using all default settings for the event detection).

@james-d-brown
Copy link
Collaborator

james-d-brown commented Nov 27, 2024

To use a different dataset for event detection, it could look something like this:

observed: observations.csv
predicted: predictions.csv
covariates:
  - sources: covariate.csv
    purpose: detect
event_detection: covariates

Again, the purpose qualifier would be to distinguish the use of this covariate for event detection and not for default filtering of the pairs. It is slightly more cumbersome than ideal, as the covariates could otherwise be declared in the same was as the observed or predicted, but this maintains backwards compatibility and, ultimately, it is more likely that covariates will be used for filtering than event detection.

@james-d-brown
Copy link
Collaborator

I think the more complex declarations flow naturally from that and there probably isn't a need to elaborate further at this stage. In other words, we can assume that event_detection will have a bunch of optional parameters, but only the dataset as a required parameter, declared in long or short form, but always as an orientation or reference to one of the formally declared datasets.

@james-d-brown
Copy link
Collaborator

james-d-brown commented Nov 27, 2024

It is perhaps worth clarifying that both the dataset and the purpose could be a list. In other words:

observed: observations.csv
predicted: predictions.csv
covariates:
  - sources: covariate.csv
    purpose: 
      - detect
      - filter
event_detection: 
  dataset:
     - observed
     - covariates

In other words, use the covariate for both filtering and event detection and perform event detection using both the observed and covariates datasets. Perhaps improbable, but perfectly possible.

@james-d-brown
Copy link
Collaborator

With the last commit, this branch should be ready to merge into main so that UAT can begin (or, rather, pre-deployment testing).

@james-d-brown james-d-brown mentioned this issue Jan 16, 2025
12 tasks
@james-d-brown
Copy link
Collaborator

Merge in progress, should then be ready for UAT.

@james-d-brown
Copy link
Collaborator

Although ready for UAT, I will continue to do some testing of this myself. There are a lot of moving parts and interactions with existing features.

@james-d-brown
Copy link
Collaborator

@james-d-brown
Copy link
Collaborator

Pending UAT.

@james-d-brown
Copy link
Collaborator

Hmm, looks like some system tests failed related to NetCDF reading. I think SQ caught a couple of issues w/r to some code there, which was touched by a class rename, and I accepted the edit without rerunning the system tests, so that is likely the culprit. Will reverse that minor change now...

james-d-brown added a commit that referenced this issue Jan 16, 2025
Reverse a minor change to NetCDF file reading, #130.
@james-d-brown
Copy link
Collaborator

Done.

@HankHerr-NOAA
Copy link
Contributor

Reviewing the wiki...

I have a question about the example,

observed: observed.csv
predicted: predicted.csv
covariates:
  - sources: precipitation.csv
    purpose: detect
  - sources: temperature.csv
event_detection: covariates

and following paragraph:

In this case, only the precipitation.csv will be used for event detection. When event_detection is declared and the purpose of a covariate is undeclared, then the context of that covariate is inspected to determine whether it should be used for event detection or filtering. Specifically, if there are no filtering parameters (i.e., no minimum or maximum), then the purpose is assumed to be detect, otherwise filter.

So the purpose of the temperature CSV is undeclared. Per your paragraph, this applies:

"Specifically, if there are no filtering parameters (i.e., no minimum or maximum), then the purpose is assumed to be detect,..."

No filtering parameters are declared, so the purpose of the temperature covariate should be detect. Yet, it will not be used for event detection according to the first sentence (only precip is). Why is precipitation the only one used for event detection?

I guess I'm not understanding the paragraph. Please let me know where I've gone wrong. Thanks,

Hank

@HankHerr-NOAA
Copy link
Contributor

Need to step away for a bit, but, after returning, I hope to complete my review of the wiki before I leave a doctor's appointment. Back in 30 minutes or so,

Hank

@james-d-brown
Copy link
Collaborator

Yes, good catch, there is meant to be a filter on temperature, just forgot to add one, will fix.

@james-d-brown
Copy link
Collaborator

Fixed the declaration and clarified the explanation.

@HankHerr-NOAA
Copy link
Contributor

Looks good. Thanks!

Continuing my review,

Hank

@HankHerr-NOAA
Copy link
Contributor

I'm almost hesitant to write this in case my math is screwed up, but here it goes...

The defaults for window_size and half_life are inconsistent with the rules for computing each when the other is defined. See this section.

If neither is defined, then, window_size = (avg time step) * 200 and half_life = (avg time step) * 20, so window_size / half_life = 10.

If one of the two are defined, then, when window_size is defined, half_life = window_size/20, and when half_life is defined, window_size = 20 * half_life. So window_size / half_life = 20.

A ratio of 20 when one is defined vs. 10 when neither is defined. I'm not sure why there would be that inconsistency, and I doubt anyone would notice unless they really look carefully at that table in that section... which I did.

Regardless, I'm not proposing a change, unless you think there is an error somewhere in either the code or the wiki. Thanks,

Hank

@HankHerr-NOAA
Copy link
Contributor

I may have overlooked it and need to go soon so I don't have time to double check...

Are the events reported in the evaluation CSV? Are they reported in the logging? It may be good to let the user know how to find those events. The images appear to only provide the earliest valid time (start time of the event?), based on the x-axis label.

Off to the doctor,

Hank

@james-d-brown
Copy link
Collaborator

I'm almost hesitant to write this in case my math is screwed up, but here it goes...

The defaults for window_size and half_life are inconsistent with the rules for computing each when the other is defined. See this section.

If neither is defined, then, window_size = (avg time step) * 200 and half_life = (avg time step) * 20, so window_size / half_life = 10.

If one of the two are defined, then, when window_size is defined, half_life = window_size/20, and when half_life is defined, window_size = 20 * half_life. So window_size / half_life = 20.

A ratio of 20 when one is defined vs. 10 when neither is defined. I'm not sure why there would be that inconsistency, and I doubt anyone would notice unless they really look carefully at that table in that section... which I did.

Regardless, I'm not proposing a change, unless you think there is an error somewhere in either the code or the wiki. Thanks,

Hank

Yeah, we can make the ratio consistent. The defaults are a bit hand-wavey, but we can at least ensure internal consistency.

So:

W=10H for H declared
H=W/10 or W=10H for W declared
H=20TS, W=200TS or W=10H for neither declared

Will adjust.

@james-d-brown
Copy link
Collaborator

I may have overlooked it and need to go soon so I don't have time to double check...

Are the events reported in the evaluation CSV? Are they reported in the logging? It may be good to let the user know how to find those events. The images appear to only provide the earliest valid time (start time of the event?), based on the x-axis label.

Off to the doctor,

Hank

The "events" are just time windows, reported in the same way that time windows are reported more generally, which includes the logging and the various numeric output formats, including pairs. The graphics are just a visual representation of (part of) the information and it's the same for the "event" time windows as any other ("non-event" or, rather, declared) time windows. One thing that's noted in the limitations is that it would make sense to separate out event detection from evaluation. That way, the events are more transparent to a user iterating them.

I could add a sentence about the output formats recording them....?

james-d-brown added a commit that referenced this issue Jan 17, 2025
…ernal consistency in the relationship between the window size and half-life when one or neither is declared, #130.
@james-d-brown
Copy link
Collaborator

I'm almost hesitant to write this in case my math is screwed up, but here it goes...
The defaults for window_size and half_life are inconsistent with the rules for computing each when the other is defined. See this section.
If neither is defined, then, window_size = (avg time step) * 200 and half_life = (avg time step) * 20, so window_size / half_life = 10.
If one of the two are defined, then, when window_size is defined, half_life = window_size/20, and when half_life is defined, window_size = 20 * half_life. So window_size / half_life = 20.
A ratio of 20 when one is defined vs. 10 when neither is defined. I'm not sure why there would be that inconsistency, and I doubt anyone would notice unless they really look carefully at that table in that section... which I did.
Regardless, I'm not proposing a change, unless you think there is an error somewhere in either the code or the wiki. Thanks,
Hank

Yeah, we can make the ratio consistent. The defaults are a bit hand-wavey, but we can at least ensure internal consistency.

So:

W=10H for H declared H=W/10 or W=10H for W declared H=20TS, W=200TS or W=10H for neither declared

Will adjust.

Done and adjusted the wiki. Will merge in momentarily.

@james-d-brown
Copy link
Collaborator

Off to the doctor,

Also off to a medical appointment now, will be back in an hour or two...

@HankHerr-NOAA
Copy link
Contributor

I could add a sentence about the output formats recording them....?

Yeah, perhaps. No more than a sentence. Thanks!

Hank

@HankHerr-NOAA
Copy link
Contributor

The wiki looks good otherwise. I'll start my testing on Tuesday, and, as mentioned in the meeting, I'll use a separate ticket to track that testing since there will be a lot of it.

Thanks,

Hank

@james-d-brown
Copy link
Collaborator

james-d-brown commented Jan 17, 2025

I could add a sentence about the output formats recording them....?

Yeah, perhaps. No more than a sentence. Thanks!

Hank

https://github.com/NOAA-OWP/wres/wiki/Event-detection#where-can-i-find-the-exact-period-spanned-by-each-event

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or improvement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants