-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find files for CMIP6 DCPP startdates #771
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I just added a couple of tests for the replace_tags function
…to dev_dcpp_startdates
@remi-kazeroni, can you please test it in DKRZ to see if we need some extra work to support it. If you or any other volunter don't have time to test it, we can merge it so at least we can read DCPP data in Jasmin |
Yes sure, I'll have a look at this PR this week |
Note that what you call startdate here is actually part of the CMIP6 DRS as Perhaps it would be prudent to adopt the same terminology. That would also open the functionality up to other MIPs that might have sub-experiments in the future. The first linked document also tells us what the directory structure should be (p.17, directory structure template), but of course we can't force every data center to conform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I have tried on various combinations of datasets with and without sub_experiments
and all of them can be read as expected. I just have a few general comments, feel free to ignore if not relevant:
- Do you think it would be useful to document these changes? The
sub_experiment
key could be added here - The
start_year
andend_year
are not needed when asub_experiment
is provided in the recipe. Instead, the decadal time bounds are used (e.g.start_year
: 2000 andend_year
: 2010 forsub_experiment
: s2000-r1i1p1f1). One can still consider a shorter time range (e.g. 2005-2008) but in this case filenames contain the whole decadal time range instead of the shorter range (e.g.plots/*_dcppA-hindcast_s2000-r1i1p1f1_tas_2000-2010.png
and notplots/*_dcppA-hindcast_s2000-r1i1p1f1_tas_2005-2008.png
). Is that expected? Perhaps this example is not relevant in practice...
Without changing the way we manage the time range, handling the start and end year will be a nightmare for the full hindcasts, as we will have to specify it manually for all years from 1979 to 2010 or so. The ability to use We have a pending issue to deal with the time range #345 and I remember another suggestion asking for a way yo specify relative ranges (first 20 years, last 50, all available...) I think we should do this rework in a separated pull request. |
The sub-experiment should really be completely independent from the startrange and endrange. It only tells you something about the initialization of the experiment. Every run is 10 years long, but there is a new run started for every year. This means that there are many different sub-experiments covering the same years and comparing these is one of the main interests of the DCPP mip. Consequently, we should expect it to be common to find dataset blocks like datasets:
- {sub-experiment: s1960, start_year: 1965, end_year: 1970}
- {sub-experiment: s1962, start_year: 1965, end_year: 1970}
- {sub-experiment: s1963, start_year: 1965, end_year: 1970}
- {sub-experiment: s1964, start_year: 1965, end_year: 1970} |
I think it already is. This finds and loads all the files available for a certain sub-experiment value. And later it adds the overall
datasets:
I am a bit lost because that's what I was trying to avoid: having 50 lines in a recipe to call a single experiment with multiple subexperiments. |
Why? |
Because everything belongs to the same dataset-experiment pair, which is what every new line in a recipe represents. It also makes the recipe easier to read. And the end goal will be to be able to deal with all those sub-experiments as one in a single cube with a new dimension. So it's like a single dataset. |
I think in the context of the DCPP mip, every sub-experiment should be seen as its own experiment. You don't always (or even generally) want to look at all sub-experiments at the same time. |
But I think this can already be done in one line with this PR using: |
Sure, there are many ways to specify these. The point is neither should we manipulate start and end because of sub-experiment, nor sub-experiment because of start and end. |
So which changes in this pull request do you not approve of? |
In the review above I tried to mark all the places that this touches. All of those remarks really are about the same issue. |
Good to know! I'll try to see what can be done about them. |
If the new changes are of your liking, I will add the tests and everything else that's missing. |
There is one comment from the previous review left to address. Now, this PR seems to introduce new functionality in terms of |
The DCPP functionality that we need at the department requires it. This pull request without it does not help much our use case. Would it be that bad to include it as a first step, and then refine it for other experiments in another pull request? |
I personally think that the changes will be available quicker if split up into two PRs because it makes the review easier and the changes more self-contained. If you really prefer to have it together we can continue here; in that case please add documentation of the new functionality in a place where it can be found. I think the functionality would already work for other experiments, no? |
Ok to merge for me. Nice work! |
Great! @ESMValGroup/esmvaltool-coreteam anyone with the time to merge it? |
good stuff @sloosvel 🍺 |
* First attempte * Do not require start and end years, add them later * Correct condition * Avoid key error in fx variables * Consider two possible paths * Fix function name * Fix variable name * Avoid duplicates in filename * Add test for startdate expansion * Add test for the replace tags method * Rename tag * Add documentation * Allow to load subexps per timerange or as a whole * Fix condition * Remove 'all_years' functionality * Fix conditions * Fix flake * Remove whitespace Co-authored-by: Javier Vegas-Regidor <javier.vegas@bsc.es>
* Add basic support for variable mappings * Add first era5 mapping * Find files for CMIP6 DCPP startdates (#771) * First attempte * Do not require start and end years, add them later * Correct condition * Avoid key error in fx variables * Consider two possible paths * Fix function name * Fix variable name * Avoid duplicates in filename * Add test for startdate expansion * Add test for the replace tags method * Rename tag * Add documentation * Allow to load subexps per timerange or as a whole * Fix condition * Remove 'all_years' functionality * Fix conditions * Fix flake * Remove whitespace Co-authored-by: Javier Vegas-Regidor <javier.vegas@bsc.es> * Skip regridding if the target grid is almost identical to the source grid (#507) Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl> Co-authored-by: Stef Smeets <s.smeets@esciencecenter.nl> * Fixes for sos and siconc of BCC models (#1090) * sos and siconc fixed * tests added * test fixed * fix flake8 * fix flake8 * fix codacy issue * Pin cf-units and fix tests (cf-units>=2.1.5) (#1140) * pin cf-units * pin cf-units * fix test * fix test * Handle IPSL-CM6 (the feature won't actually work without #1124) * class Huss inherits from cass Tas. Also : Fix codacy diags. * Replace os.system() by subprocess.run() * Fix flake8 diags * var_mapping -> extra_facets * Rename _config/variable_details to _config/extra_facets * Fix doc re. lack of 'output_file as a dict', and choice of native6 * Fix codacy diags in ipsl_cm6.py * Use project IPSLCM to handle IPSL-CM6 * Implement changes according to Bouwe's review, 2021/06/07 (except unit tests) * Add unit tests for _fixes/ipslcm/ipsl_cm6.py * delete esmvalcore/cmor/_fixes/native6/ipsl_cm6.py * Delete old file esmvalcore/_config/extra_facets/native6-ipsl-cm6-mappings.yml * Restore main versions for _recipe.py and cmor_fixes/fix.py * Restore main version for _recipe.py * Delete extraneous era5-mappings.yml * Avoid using mapping_key when calling fix.get_cube_from_list() * Empty change in fix.py for forcing codacy to re-scan it * Polish doc * Polish doc again * Again... * and again ... * Fix typo in comment * Fixes according to @zklaus review * Reduce formatting changes * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se> * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se> * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se> * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se> * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se> * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se> * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se> * Update doc/quickstart/find_data.rst Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se> * Update doc/quickstart/find_data.rst Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se> * Minor formatting improvements * Organize mapping file in each realm in two sections (CMIP6 and IPSL) Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se> Co-authored-by: sloosvel <45196700+sloosvel@users.noreply.github.com> Co-authored-by: Javier Vegas-Regidor <javier.vegas@bsc.es> Co-authored-by: Benjamin Müller <b.mueller@iggf.geo.uni-muenchen.de> Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl> Co-authored-by: Stef Smeets <s.smeets@esciencecenter.nl> Co-authored-by: Rémi Kazeroni <70641264+remi-kazeroni@users.noreply.github.com> Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>
Before you start, please read our contribution guidelines.
Tasks
yamllint
to check that your YAML files do not contain mistakesIf you need help with any of the tasks above, please do not hesitate to ask by commenting in the issue or pull request.
A 'startdate' tag, that gets expanded in the same way as ensembles, can be added when calling some CMIP6 DCPP experiments that follow the scheme:
{short_name}{mip}{dataset}{exp}{startdate}-{ensemble}_{grid}*
The paths sometimes include the startdate, but in other cases they do not, so both cases are considered:
@jvegasbsc is there any other that should be considered?
Closes #632