Find files for CMIP6 DCPP startdates #771

sloosvel · 2020-09-08T09:13:42Z

Before you start, please read our contribution guidelines.

Tasks

Create an issue to discuss what you are going to do, if you haven't done so already (and add the link at the bottom)
This pull request has a descriptive title that can be used in a changelog
Add unit tests
Public functions should have a numpy-style docstring so they appear properly in the API documentation. For all other functions a one line docstring is sufficient.
If writing a new/modified preprocessor function, please update the documentation
Circle/CI tests pass. Status can be seen below your pull request. If the tests are failing, click the link to find out why.
Codacy code quality checks pass. Status can be seen below your pull request. If there is an error, click the link to find out why. If you suspect Codacy may be wrong, please ask by commenting.
Please use yamllint to check that your YAML files do not contain mistakes
If you make backward incompatible changes to the recipe format, make a new pull request in the ESMValTool repository and add the link below

If you need help with any of the tasks above, please do not hesitate to ask by commenting in the issue or pull request.

A 'startdate' tag, that gets expanded in the same way as ensembles, can be added when calling some CMIP6 DCPP experiments that follow the scheme:

{short_name}{mip}{dataset}{exp}{startdate}-{ensemble}_{grid}*

The paths sometimes include the startdate, but in other cases they do not, so both cases are considered:

../{exp}/{startdate}-{ensemble}/... (as in Jasmin)
../{exp}/{ensemble}/...

@jvegasbsc is there any other that should be considered?

Closes #632

…to dev_dcpp_startdates

jvegreg

Nice work! I just added a couple of tests for the replace_tags function

…to dev_dcpp_startdates

jvegreg · 2021-04-27T10:40:53Z

@remi-kazeroni, can you please test it in DKRZ to see if we need some extra work to support it. If you or any other volunter don't have time to test it, we can merge it so at least we can read DCPP data in Jasmin

remi-kazeroni · 2021-04-27T13:08:02Z

@remi-kazeroni, can you please test it in DKRZ to see if we need some extra work to support it. If you or any other volunter don't have time to test it, we can merge it so at least we can read DCPP data in Jasmin

Yes sure, I'll have a look at this PR this week

zklaus · 2021-04-28T12:56:00Z

Note that what you call startdate here is actually part of the CMIP6 DRS as sub_experiment. See CMIP6 Global Attributes, DRS, Filenames, Directory Structure, and CV’s, particularly p. 13, DRS components and p. 14, filename template. This also makes reference to the applicable CV in CMIP6_sub_experiment.json.

Perhaps it would be prudent to adopt the same terminology. That would also open the functionality up to other MIPs that might have sub-experiments in the future.

The first linked document also tells us what the directory structure should be (p.17, directory structure template), but of course we can't force every data center to conform.

remi-kazeroni

Nice work! I have tried on various combinations of datasets with and without sub_experiments and all of them can be read as expected. I just have a few general comments, feel free to ignore if not relevant:

Do you think it would be useful to document these changes? The sub_experiment key could be added here
The start_year and end_year are not needed when a sub_experiment is provided in the recipe. Instead, the decadal time bounds are used (e.g. start_year: 2000 and end_year: 2010 for sub_experiment: s2000-r1i1p1f1). One can still consider a shorter time range (e.g. 2005-2008) but in this case filenames contain the whole decadal time range instead of the shorter range (e.g. plots/*_dcppA-hindcast_s2000-r1i1p1f1_tas_2000-2010.png and not plots/*_dcppA-hindcast_s2000-r1i1p1f1_tas_2005-2008.png). Is that expected? Perhaps this example is not relevant in practice...

jvegreg · 2021-05-03T09:46:38Z

The start_year and end_year are not needed when a sub_experiment is provided in the recipe. Instead, the decadal time bounds are used (e.g. start_year: 2000 and end_year: 2010 for sub_experiment: s2000-r1i1p1f1). One can still consider a shorter time range (e.g. 2005-2008) but in this case filenames contain the whole decadal time range instead of the shorter range (e.g. plots/_dcppA-hindcast_s2000-r1i1p1f1_tas_2000-2010.png and not plots/_dcppA-hindcast_s2000-r1i1p1f1_tas_2005-2008.png). Is that expected? Perhaps this example is not relevant in practice...

Without changing the way we manage the time range, handling the start and end year will be a nightmare for the full hindcasts, as we will have to specify it manually for all years from 1979 to 2010 or so. The ability to use s(1979:2010) will be useless.

We have a pending issue to deal with the time range #345 and I remember another suggestion asking for a way yo specify relative ranges (first 20 years, last 50, all available...) I think we should do this rework in a separated pull request.

zklaus · 2021-05-03T10:04:17Z

The sub-experiment should really be completely independent from the startrange and endrange. It only tells you something about the initialization of the experiment. Every run is 10 years long, but there is a new run started for every year. This means that there are many different sub-experiments covering the same years and comparing these is one of the main interests of the DCPP mip.

Consequently, we should expect it to be common to find dataset blocks like

datasets:
  - {sub-experiment: s1960, start_year: 1965, end_year: 1970}
  - {sub-experiment: s1962, start_year: 1965, end_year: 1970}
  - {sub-experiment: s1963, start_year: 1965, end_year: 1970}
  - {sub-experiment: s1964, start_year: 1965, end_year: 1970}

sloosvel · 2021-05-03T11:42:37Z

The sub-experiment should really be completely independent from the startrange and endrange.

I think it already is. This finds and loads all the files available for a certain sub-experiment value. And later it adds the overall start_year and end_year. It's true that you cannot subset it as @remi-kazeroni says, but I think people normally load all years available anyway.

Consequently, we should expect it to be common to find dataset blocks like

datasets:

{sub-experiment: s1960, start_year: 1965, end_year: 1970}
{sub-experiment: s1962, start_year: 1965, end_year: 1970}
{sub-experiment: s1963, start_year: 1965, end_year: 1970}
{sub-experiment: s1964, start_year: 1965, end_year: 1970}

I am a bit lost because that's what I was trying to avoid: having 50 lines in a recipe to call a single experiment with multiple subexperiments.

zklaus · 2021-05-03T11:44:03Z

I am a bit lost because that's what I was trying to avoid: having 50 lines in a recipe to call a single experiment with multiple subexperiments.

Why?

sloosvel · 2021-05-03T12:28:18Z

Because everything belongs to the same dataset-experiment pair, which is what every new line in a recipe represents. It also makes the recipe easier to read.

And the end goal will be to be able to deal with all those sub-experiments as one in a single cube with a new dimension. So it's like a single dataset.

zklaus · 2021-05-03T12:32:04Z

I think in the context of the DCPP mip, every sub-experiment should be seen as its own experiment. You don't always (or even generally) want to look at all sub-experiments at the same time.

remi-kazeroni · 2021-05-03T14:49:32Z

datasets:
* {sub-experiment: s1960, start_year: 1965, end_year: 1970}

* {sub-experiment: s1962, start_year: 1965, end_year: 1970}

* {sub-experiment: s1963, start_year: 1965, end_year: 1970}

* {sub-experiment: s1964, start_year: 1965, end_year: 1970}
I am a bit lost because that's what I was trying to avoid: having 50 lines in a recipe to call a single experiment with multiple subexperiments.

But I think this can already be done in one line with this PR using: {sub-experiment: s(1960:1964), start_year: 1965, end_year: 1970} (assuming 1961 is used as well).
I tested that with the examples/recipe_python.yml and only the data between start_year and end_year are used and plotted. But the filenames in the plots dir are in the form *dcppA-hindcast_s1964-r1i1p1f1_tas_1964-1974.png while one would expect *dcppA-hindcast_s1964-r1i1p1f1_tas_1965-1970.png

zklaus · 2021-05-03T17:28:39Z

Sure, there are many ways to specify these. The point is neither should we manipulate start and end because of sub-experiment, nor sub-experiment because of start and end.

sloosvel · 2021-05-10T09:45:54Z

So which changes in this pull request do you not approve of?

doc/recipe/overview.rst

esmvalcore/_data_finder.py

esmvalcore/_recipe.py

esmvalcore/_recipe_checks.py

zklaus · 2021-05-10T12:36:21Z

So which changes in this pull request do you not approve of?

In the review above I tried to mark all the places that this touches. All of those remarks really are about the same issue.

sloosvel · 2021-05-10T13:05:18Z

Good to know! I'll try to see what can be done about them.

sloosvel · 2021-05-10T15:38:22Z

If the new changes are of your liking, I will add the tests and everything else that's missing.

zklaus · 2021-05-11T07:24:54Z

There is one comment from the previous review left to address.

Now, this PR seems to introduce new functionality in terms of all_years. That appears to be completely unrelated to DCPP, so I suggest putting it into a different PR. That way we will be able to wrap this one up more quickly and the functionality (that will be appreciated by others for other experiments!) will get better visibility.

sloosvel · 2021-05-11T07:41:21Z

That appears to be completely unrelated to DCPP,

The DCPP functionality that we need at the department requires it. This pull request without it does not help much our use case.

Would it be that bad to include it as a first step, and then refine it for other experiments in another pull request?

zklaus · 2021-05-11T07:46:20Z

I personally think that the changes will be available quicker if split up into two PRs because it makes the review easier and the changes more self-contained. If you really prefer to have it together we can continue here; in that case please add documentation of the new functionality in a place where it can be found. I think the functionality would already work for other experiments, no?

esmvalcore/_data_finder.py

doc/recipe/overview.rst

zklaus · 2021-05-12T08:07:02Z

Ok to merge for me. Nice work!

sloosvel · 2021-05-13T10:25:08Z

Great! @ESMValGroup/esmvaltool-coreteam anyone with the time to merge it?

valeriupredoi · 2021-05-13T11:28:29Z

good stuff @sloosvel 🍺

* First attempte * Do not require start and end years, add them later * Correct condition * Avoid key error in fx variables * Consider two possible paths * Fix function name * Fix variable name * Avoid duplicates in filename * Add test for startdate expansion * Add test for the replace tags method * Rename tag * Add documentation * Allow to load subexps per timerange or as a whole * Fix condition * Remove 'all_years' functionality * Fix conditions * Fix flake * Remove whitespace Co-authored-by: Javier Vegas-Regidor <javier.vegas@bsc.es>

@zklaus

* Add basic support for variable mappings * Add first era5 mapping * Find files for CMIP6 DCPP startdates (#771) * First attempte * Do not require start and end years, add them later * Correct condition * Avoid key error in fx variables * Consider two possible paths * Fix function name * Fix variable name * Avoid duplicates in filename * Add test for startdate expansion * Add test for the replace tags method * Rename tag * Add documentation * Allow to load subexps per timerange or as a whole * Fix condition * Remove 'all_years' functionality * Fix conditions * Fix flake * Remove whitespace Co-authored-by: Javier Vegas-Regidor <javier.vegas@bsc.es> * Skip regridding if the target grid is almost identical to the source grid (#507) Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl> Co-authored-by: Stef Smeets <s.smeets@esciencecenter.nl> * Fixes for sos and siconc of BCC models (#1090) * sos and siconc fixed * tests added * test fixed * fix flake8 * fix flake8 * fix codacy issue * Pin cf-units and fix tests (cf-units>=2.1.5) (#1140) * pin cf-units * pin cf-units * fix test * fix test * Handle IPSL-CM6 (the feature won't actually work without #1124) * class Huss inherits from cass Tas. Also : Fix codacy diags. * Replace os.system() by subprocess.run() * Fix flake8 diags * var_mapping -> extra_facets * Rename _config/variable_details to _config/extra_facets * Fix doc re. lack of 'output_file as a dict', and choice of native6 * Fix codacy diags in ipsl_cm6.py * Use project IPSLCM to handle IPSL-CM6 * Implement changes according to Bouwe's review, 2021/06/07 (except unit tests) * Add unit tests for _fixes/ipslcm/ipsl_cm6.py * delete esmvalcore/cmor/_fixes/native6/ipsl_cm6.py * Delete old file esmvalcore/_config/extra_facets/native6-ipsl-cm6-mappings.yml * Restore main versions for _recipe.py and cmor_fixes/fix.py * Restore main version for _recipe.py * Delete extraneous era5-mappings.yml * Avoid using mapping_key when calling fix.get_cube_from_list() * Empty change in fix.py for forcing codacy to re-scan it * Polish doc * Polish doc again * Again... * and again ... * Fix typo in comment * Fixes according to @zklaus review * Reduce formatting changes * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se> * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se> * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se> * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se> * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se> * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se> * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se> * Update doc/quickstart/find_data.rst Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se> * Update doc/quickstart/find_data.rst Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se> * Minor formatting improvements * Organize mapping file in each realm in two sections (CMIP6 and IPSL) Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se> Co-authored-by: sloosvel <45196700+sloosvel@users.noreply.github.com> Co-authored-by: Javier Vegas-Regidor <javier.vegas@bsc.es> Co-authored-by: Benjamin Müller <b.mueller@iggf.geo.uni-muenchen.de> Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl> Co-authored-by: Stef Smeets <s.smeets@esciencecenter.nl> Co-authored-by: Rémi Kazeroni <70641264+remi-kazeroni@users.noreply.github.com> Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>

sloosvel added 8 commits May 11, 2020 16:37

First attempte

70f3b87

Do not require start and end years, add them later

f5d4e05

Correct condition

c73eb4e

Avoid key error in fx variables

0d30f7a

Solve conflicts

04be270

Consider two possible paths

1771c6b

Fix function name

6ad31e5

Fix variable name

811026f

sloosvel requested a review from jvegreg September 8, 2020 09:13

sloosvel and others added 3 commits November 13, 2020 14:32

Avoid duplicates in filename

96b236f

Add test for startdate expansion

ec10ee7

Merge branch 'master' of https://github.com/ESMValGroup/ESMValCore in…

1b50d62

…to dev_dcpp_startdates

sloosvel marked this pull request as ready for review November 19, 2020 10:06

jvegreg added cmor Related to the CMOR standard ISENES labels Apr 27, 2021

Add test for the replace tags method

401abdc

jvegreg approved these changes Apr 27, 2021

View reviewed changes

jvegreg requested a review from remi-kazeroni April 27, 2021 09:50

Merge branch 'master' of https://github.com/ESMValGroup/ESMValCore in…

599ff5a

…to dev_dcpp_startdates

Rename tag

dbb4b07

remi-kazeroni approved these changes May 3, 2021

View reviewed changes

Add documentation

4296a74

zklaus requested changes May 10, 2021

View reviewed changes

Allow to load subexps per timerange or as a whole

af7c285

Fix condition

68ff783

Remove 'all_years' functionality

a303e85

sloosvel mentioned this pull request May 11, 2021

Add option to load selected or all years available in an experiment #1120

Closed

sloosvel added 4 commits May 11, 2021 11:26

Fix conditions

1737bcb

Fix flake

e0f40cb

Merge remote-tracking branch 'origin/master' into dev_dcpp_startdates

c9abd1b

Merge remote-tracking branch 'origin/master' into dev_dcpp_startdates

69317c4

sloosvel mentioned this pull request May 11, 2021

Add tag all_years: True #1122

Closed

10 tasks

zklaus approved these changes May 11, 2021

View reviewed changes

esmvalcore/_data_finder.py Outdated Show resolved Hide resolved

doc/recipe/overview.rst Outdated Show resolved Hide resolved

Remove whitespace

aff7769

valeriupredoi merged commit ae5c30d into master May 13, 2021

valeriupredoi deleted the dev_dcpp_startdates branch May 13, 2021 11:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find files for CMIP6 DCPP startdates #771

Find files for CMIP6 DCPP startdates #771

sloosvel commented Sep 8, 2020 •

edited by senesis

Loading

jvegreg left a comment

jvegreg commented Apr 27, 2021

remi-kazeroni commented Apr 27, 2021

zklaus commented Apr 28, 2021

remi-kazeroni left a comment

jvegreg commented May 3, 2021

zklaus commented May 3, 2021

sloosvel commented May 3, 2021

zklaus commented May 3, 2021

sloosvel commented May 3, 2021

zklaus commented May 3, 2021

remi-kazeroni commented May 3, 2021

zklaus commented May 3, 2021

sloosvel commented May 10, 2021

zklaus commented May 10, 2021

sloosvel commented May 10, 2021

sloosvel commented May 10, 2021

zklaus commented May 11, 2021

sloosvel commented May 11, 2021

zklaus commented May 11, 2021

zklaus commented May 12, 2021

sloosvel commented May 13, 2021

valeriupredoi commented May 13, 2021

Find files for CMIP6 DCPP startdates #771

Find files for CMIP6 DCPP startdates #771

Conversation

sloosvel commented Sep 8, 2020 • edited by senesis Loading

jvegreg left a comment

Choose a reason for hiding this comment

jvegreg commented Apr 27, 2021

remi-kazeroni commented Apr 27, 2021

zklaus commented Apr 28, 2021

remi-kazeroni left a comment

Choose a reason for hiding this comment

jvegreg commented May 3, 2021

zklaus commented May 3, 2021

sloosvel commented May 3, 2021

zklaus commented May 3, 2021

sloosvel commented May 3, 2021

zklaus commented May 3, 2021

remi-kazeroni commented May 3, 2021

zklaus commented May 3, 2021

sloosvel commented May 10, 2021

zklaus commented May 10, 2021

sloosvel commented May 10, 2021

sloosvel commented May 10, 2021

zklaus commented May 11, 2021

sloosvel commented May 11, 2021

zklaus commented May 11, 2021

zklaus commented May 12, 2021

sloosvel commented May 13, 2021

valeriupredoi commented May 13, 2021

sloosvel commented Sep 8, 2020 •

edited by senesis

Loading