-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ensure rejected/missing data in peak flows are flagged as such in output #19
Comments
New metadata "field" now added so that excluded years and rejected periods can be accessed. e.g.: The list of excluded years is the same as the list in the “.AM” file. It takes the rejected periods into account so you don’t have to look at both if you just want the list of water-years. This data is used to provide the red shading on the AMAX and POT plots on the NRFA web-site (peak flow tab of the gauging station pages, e.g. https://nrfa.ceh.ac.uk/data/station/peakflow/43010 It may be possible to integrate this information with the time series data, e.g. as data quality flags, in future, but we don't have a consistent pattern for that yet. “missing” data periods is something else entirely. These are shaded yellow(ish?) on the POT plot, and listed in the missing data tab on the website. They sort-of appear in the “.PT” file, but in a processed form excluding the shorter missing periods. There is no API output for these at the moment. There are plenty of issues with the reliability of the POT data, even if you make use of the missing data periods we have. |
Thanks Matt - the new field is quite useful.
All of the above can be done also for the type = "pot-flow" option, with a good amount of good which needs to be added if we go for option 3 (but maybe as Matt says we should really make it easy for people to get to the POT datasets since they are somewhat unreliable). |
FYI - I have in the meanwhile created some functions in winfapReader to obtain annual maxima, peaks over threshold and catchment descriptors from the NRFA API. These functions output objects which should be more similar to what one could derive from the winfap files. I have also added some ideas on how the winfapReader package could be combined with rnrfa in the vignette. I hope I have done justice to the NRFA view on the peak flow holdings and to the rnrfa package for its functionality. Comments and feedback would be very welcome! |
Has there been further development of this? I think an option that explicitly showed that a year was rejected would be best. Some users want to use it anyway (or reverse the series looking at only the rejected years, e.g. after a reservoir has been built). We could include rejected flags within the time series formats from the NRFA API, but I don't know exactly how RNRFA makes use of these. For POT data it's also important to have the missing data periods, and these wouldn't fit with the time series data, so would need to be metadata or a separate end point (which seems excessive). Are these needed explicitly as well as the rejected periods? |
And incidentally it seems httr has in built functions to check and manage caching: https://github.com/r-lib/httr/blob/master/R/cache.R |
Hi Matt - changing the output of the NRFA API could make the information easier to collate, but the functions I built in winfapReader (happy to change the name by the way) do give information on the rejected year. I think we need @cvitolo to weigh in on this since it would mean that for some data types the
And indeed it is to complicated to add the missing/rejected year information on the POT records. I am inclined to have separate functions for these records, either via winfapReader or with functions within RNRFA. Again - I think @cvitolo gets the final word on this Re: caching. I was worried about getting too much volume of requests to the API, I'll have a look into it although I know nothing about it, thanks for the pointer. |
Hi @ilapros and @mattfry-ceh! |
Of sorry - should have thought of that - repex for annual maxima below
The issue is that rnrfa at present doesn't report the missing information. This is even more complicated for the pot data, where a missing year might be thought of as a year with no peaks above the threshold (see for example https://nrfa.ceh.ac.uk/data/station/peakflow/39003 for a station with pot periods of missing recording and rejected years)
The root of the problem is that peak flow data have mostly been used in a different way than the daily data I think. |
Just to clarify, I think the model is that there are rejected periods and missing periods, which both have a start and end date which doesn't have to conform to a year start / end. See https://nrfa.ceh.ac.uk/data/station/peakflow/39003 and select "Missing Data" from the dropdown to see example. These combine to define "Rejected" Ann Max values, i.e. a significant part of the year was missing or unrepresentative. The significance is manually defined, i.e. it's not a question of automatically rejecting a year because any of it was missing. So the NRFA basically stores a set of rejected years for AM values. POT data are also flagged as rejected, but this is solely because they fall in an unrepresentative period. There is no concept of rejected years of POT data by default. To properly use this I think the NRFA API should be able to provide:
But any user of the POT data will need to know about the full set of missing data periods in order to properly use the data. |
Major changes: 1. Fixed issue #19 whereby rejected/missing data in peak flows are flagged as such in output. Added full_info to input parameters to retrieve data quality flags. 2. timeseries are now classed as zoo object, not xts. 3. startseason and endseason in seasonal_averages() are now deprecated, seasons are labelled by the calendar quarter in which the season ends. Minor changes: - Added more tests
I just committed my first attempt to fix this issue. Basically I added an extra input parameter, called Here is how it works with amax, for which flags are derived from peak-flow-rejected-amax-years:
With pot, the quality flags are derived from peak-flow-rejected-periods. In this case, the result is not directly comparable with winfapReader. Here is an example:
@mattfry-ceh @ilapros please let me know what you think. |
Thanks @ilapros ! Just merged your PR |
For the use I do of the data (and for my knowledge of the data) there is an issue when retrieving the amax-flow and pot-flow: the NRFA flags some years as Rejected (for amax) and as Missing (for the pot) and this information is not available in the time-series format, but is only available in the feh-data format. In my winfapReader package I have functions to process the feh-data formats, and I am happy to rework those to be usable in the rnrfa package (I have a prototyped something this morning). I have two main issues with these:
Once this issue is resolved I think we should discourage users to use the pot-flow/stage and amax-flow/stage types as they give data which could have some unreliable records.
The text was updated successfully, but these errors were encountered: