-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to load selected or all years available in an experiment #1120
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This issue is not about a general wildcard mechanism, though. @sloosvel, perhaps you can elaborate on the use case a little bit? Why is it impossible or very difficult to use the normal start_year, end_year mechanism in this case? |
Thank you @zklaus. As you mention, it's not so much about the readability of recipes with wildcards. It's just that the current way of specifying start and end years, while clipping the dates from January 1st to December 31st, is not very convenient for certain datasets Below you can see how files for a DCPP experiment can look like. These are files from the same experiment (
So what would happen when someone tries to work with the full dataset with ESMValTool? First they would have to define 58 entries in the recipe to call all the sub-experiments:
... 56 lines later we are done loading one dataset (in a tool that is meant to compare multiple datasets). I think it would be much more user friendly to have an interface like that:
And what happens if we only want to take into account, let's say, a subset of years from each sub_experiment? (I don't know if this a real-life application)
Wouldn't it be more user friendly to do so?
But the problem here is another one. The
So now it's not only a problem of user-friendliness, but rather that a preprocessing function is returning wrong values. In summary, what I was trying to address here is trying to find a compact way of loading these types of datasets, as well as fixing the clipping of the time ranges. Whether this should be done with wildcards, compiling recipes or not accepting this kind of general recipes in the repository because of readability, I'm all for doing whatever is most convenient to everyone. But the main issue is not this one. |
My bad! I was not explaining things very well. |
I'm not sure if it's a good idea to move preprocessor functionality to the dataset section. Wouldn't it make more sense to create a preprocessor that selects the first or last data from all datasets, if that's something that's required?
Would it be possible to set the range to the correct values when expanding the |
The
@zklaus did not let me do that in #771, because we should allow people to work with a subset of years for each sub-experiment. And that means that the clipping should be generalised for these type of cases. |
My point was that there is nothing special about the sub-experiment. I think of every sub-experiment as its own experiment. So if we want to say that leaving out the start year means "take everything from the beginning of the experiment", and leaving out the end year means "take everything to the end of the experiment", that could work, but there is no reason to tie this to the sub-experiment. Indeed, this functionality would be useful also to compare, for example, spin-ups of varying lengths, etc. So I like the feature very much. I am a bit concerned about using just the absence of the tag as a marker since that seems to invite accidentally voluminous analysis. Hence, my suggestion to have a different syntax in the recipe. tl;dr
|
Maybe renaming the tag to |
I know, I think it's a not very good design and not something we should encourage.
Could we use a wildcard |
So all recipes would need to be modified? |
I think @ledm asked years ago for a feature to allow users easily load all available years but also the first/last X years and such. I would prefer a syntax that fixes all those cases at once. Regarding the preprocessor function: it will usually go after the checker, so users can potentially be affected by issues in files that are really not required. Also, bear in mind that one use case for this will be inspecting the last years (30, 50 or so) of spin-up experiments that can run for 200 years: that may imply a lot of files to load that we do not really require. My suggestion:
Some examples (no need to support all of them from the start) # Our current case
timespan: 1980/2020
# More granular options
timespan: 198012/202011
# Start and duration
timespan: 1980/P3Y
timespan: 198005/P3M
# The next ones are really important for us in the decadal / seasonal applications
# The full period
timespan: *
# Period at the start / end of data availability
timespan: P10Y # or P24M P300D later on if we reach seasonal timescales
timespan: P-10y # No way to represent the from the end in the standard, so I used the Python -
# Relative periods
timespan: P10Y/P3M
timespan: P0y/-P5M
We may also support replacing the standard / for a space for readability |
So which option would you go for? The timespan thing can fit what is already half started in #1133 |
We would need to make sure that the proposed solution works with #345 (comment), though of course there is no need to exactly match the specification from the CMIP filenames with what we write in a recipe. |
I really like this idea and I am all for it! Despite the fact that I hate it when we change recipe interface and it breaks everything - usually with no notice. Please keep the previous input functional or at least add an error message which provides a command to switch from the old standard to the new standard. If not, we'd be needlessly frustrating our users - again. |
Thanks for a fruitful discussion everyone! @sloosvel Could you please add a short summary to this issue, so we don't forget? If only a duration or wildcard is specified, it would be best if this could be expanded to a start_time/end_time and written to the resulting recipe as proposed in #1138. That way it is completely specified to users outside your own institute what data needs to be obtained to run a recipe. |
A new
Furthermore:
And finally, these changes will also affect the |
Regarding the domain that is selected, we would like to keep the current behaviour, e.g. if in the recipe
so the end time is inclusive, i.e. 1 is added to the year/month/day/hour/minute/second that is the precision and then all data smaller than that is selected. |
Is your feature request related to a problem? Please describe.
In #771 we tried to add the functionality to load all DCPP data without having to specify the
start_year
and theend_year
, as the current time range handling is not ideal (#345). To do so, a new tag ` to load all the data available was introduced. @zklaus pointed out that it could work for all experiments.Would you be able to help out?
Yes
The text was updated successfully, but these errors were encountered: