Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find files for CMIP6 DCPP startdates #771

Merged
merged 23 commits into from
May 13, 2021
Merged

Conversation

sloosvel
Copy link
Contributor

@sloosvel sloosvel commented Sep 8, 2020

Before you start, please read our contribution guidelines.

Tasks

  • Create an issue to discuss what you are going to do, if you haven't done so already (and add the link at the bottom)
  • This pull request has a descriptive title that can be used in a changelog
  • Add unit tests
  • Public functions should have a numpy-style docstring so they appear properly in the API documentation. For all other functions a one line docstring is sufficient.
  • If writing a new/modified preprocessor function, please update the documentation
  • Circle/CI tests pass. Status can be seen below your pull request. If the tests are failing, click the link to find out why.
  • Codacy code quality checks pass. Status can be seen below your pull request. If there is an error, click the link to find out why. If you suspect Codacy may be wrong, please ask by commenting.
  • Please use yamllint to check that your YAML files do not contain mistakes
  • If you make backward incompatible changes to the recipe format, make a new pull request in the ESMValTool repository and add the link below

If you need help with any of the tasks above, please do not hesitate to ask by commenting in the issue or pull request.


A 'startdate' tag, that gets expanded in the same way as ensembles, can be added when calling some CMIP6 DCPP experiments that follow the scheme:

{short_name}{mip}{dataset}{exp}{startdate}-{ensemble}_{grid}*

The paths sometimes include the startdate, but in other cases they do not, so both cases are considered:

  • ../{exp}/{startdate}-{ensemble}/... (as in Jasmin)
  • ../{exp}/{ensemble}/...

@jvegasbsc is there any other that should be considered?

Closes #632

@sloosvel sloosvel requested a review from jvegreg September 8, 2020 09:13
@sloosvel sloosvel marked this pull request as ready for review November 19, 2020 10:06
@jvegreg jvegreg added cmor Related to the CMOR standard ISENES labels Apr 27, 2021
Copy link
Contributor

@jvegreg jvegreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I just added a couple of tests for the replace_tags function

@jvegreg jvegreg requested a review from remi-kazeroni April 27, 2021 09:50
@jvegreg
Copy link
Contributor

jvegreg commented Apr 27, 2021

@remi-kazeroni, can you please test it in DKRZ to see if we need some extra work to support it. If you or any other volunter don't have time to test it, we can merge it so at least we can read DCPP data in Jasmin

@remi-kazeroni
Copy link
Contributor

@remi-kazeroni, can you please test it in DKRZ to see if we need some extra work to support it. If you or any other volunter don't have time to test it, we can merge it so at least we can read DCPP data in Jasmin

Yes sure, I'll have a look at this PR this week

@zklaus
Copy link

zklaus commented Apr 28, 2021

Note that what you call startdate here is actually part of the CMIP6 DRS as sub_experiment. See CMIP6 Global Attributes, DRS, Filenames, Directory Structure, and CV’s, particularly p. 13, DRS components and p. 14, filename template. This also makes reference to the applicable CV in CMIP6_sub_experiment.json.

Perhaps it would be prudent to adopt the same terminology. That would also open the functionality up to other MIPs that might have sub-experiments in the future.

The first linked document also tells us what the directory structure should be (p.17, directory structure template), but of course we can't force every data center to conform.

Copy link
Contributor

@remi-kazeroni remi-kazeroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I have tried on various combinations of datasets with and without sub_experiments and all of them can be read as expected. I just have a few general comments, feel free to ignore if not relevant:

  1. Do you think it would be useful to document these changes? The sub_experiment key could be added here
  2. The start_year and end_year are not needed when a sub_experiment is provided in the recipe. Instead, the decadal time bounds are used (e.g. start_year: 2000 and end_year: 2010 for sub_experiment: s2000-r1i1p1f1). One can still consider a shorter time range (e.g. 2005-2008) but in this case filenames contain the whole decadal time range instead of the shorter range (e.g. plots/*_dcppA-hindcast_s2000-r1i1p1f1_tas_2000-2010.png and not plots/*_dcppA-hindcast_s2000-r1i1p1f1_tas_2005-2008.png). Is that expected? Perhaps this example is not relevant in practice...

@jvegreg
Copy link
Contributor

jvegreg commented May 3, 2021

The start_year and end_year are not needed when a sub_experiment is provided in the recipe. Instead, the decadal time bounds are used (e.g. start_year: 2000 and end_year: 2010 for sub_experiment: s2000-r1i1p1f1). One can still consider a shorter time range (e.g. 2005-2008) but in this case filenames contain the whole decadal time range instead of the shorter range (e.g. plots/_dcppA-hindcast_s2000-r1i1p1f1_tas_2000-2010.png and not plots/_dcppA-hindcast_s2000-r1i1p1f1_tas_2005-2008.png). Is that expected? Perhaps this example is not relevant in practice...

Without changing the way we manage the time range, handling the start and end year will be a nightmare for the full hindcasts, as we will have to specify it manually for all years from 1979 to 2010 or so. The ability to use s(1979:2010) will be useless.

We have a pending issue to deal with the time range #345 and I remember another suggestion asking for a way yo specify relative ranges (first 20 years, last 50, all available...) I think we should do this rework in a separated pull request.

@zklaus
Copy link

zklaus commented May 3, 2021

The sub-experiment should really be completely independent from the startrange and endrange. It only tells you something about the initialization of the experiment. Every run is 10 years long, but there is a new run started for every year. This means that there are many different sub-experiments covering the same years and comparing these is one of the main interests of the DCPP mip.

Consequently, we should expect it to be common to find dataset blocks like

datasets:
  - {sub-experiment: s1960, start_year: 1965, end_year: 1970}
  - {sub-experiment: s1962, start_year: 1965, end_year: 1970}
  - {sub-experiment: s1963, start_year: 1965, end_year: 1970}
  - {sub-experiment: s1964, start_year: 1965, end_year: 1970}

@sloosvel
Copy link
Contributor Author

sloosvel commented May 3, 2021

The sub-experiment should really be completely independent from the startrange and endrange.

I think it already is. This finds and loads all the files available for a certain sub-experiment value. And later it adds the overall start_year and end_year. It's true that you cannot subset it as @remi-kazeroni says, but I think people normally load all years available anyway.

Consequently, we should expect it to be common to find dataset blocks like

datasets:

  • {sub-experiment: s1960, start_year: 1965, end_year: 1970}
  • {sub-experiment: s1962, start_year: 1965, end_year: 1970}
  • {sub-experiment: s1963, start_year: 1965, end_year: 1970}
  • {sub-experiment: s1964, start_year: 1965, end_year: 1970}

I am a bit lost because that's what I was trying to avoid: having 50 lines in a recipe to call a single experiment with multiple subexperiments.

@zklaus
Copy link

zklaus commented May 3, 2021

I am a bit lost because that's what I was trying to avoid: having 50 lines in a recipe to call a single experiment with multiple subexperiments.

Why?

@sloosvel
Copy link
Contributor Author

sloosvel commented May 3, 2021

Because everything belongs to the same dataset-experiment pair, which is what every new line in a recipe represents. It also makes the recipe easier to read.

And the end goal will be to be able to deal with all those sub-experiments as one in a single cube with a new dimension. So it's like a single dataset.

@zklaus
Copy link

zklaus commented May 3, 2021

I think in the context of the DCPP mip, every sub-experiment should be seen as its own experiment. You don't always (or even generally) want to look at all sub-experiments at the same time.

@remi-kazeroni
Copy link
Contributor

datasets:

* {sub-experiment: s1960, start_year: 1965, end_year: 1970}

* {sub-experiment: s1962, start_year: 1965, end_year: 1970}

* {sub-experiment: s1963, start_year: 1965, end_year: 1970}

* {sub-experiment: s1964, start_year: 1965, end_year: 1970}

I am a bit lost because that's what I was trying to avoid: having 50 lines in a recipe to call a single experiment with multiple subexperiments.

But I think this can already be done in one line with this PR using: {sub-experiment: s(1960:1964), start_year: 1965, end_year: 1970} (assuming 1961 is used as well).
I tested that with the examples/recipe_python.yml and only the data between start_year and end_year are used and plotted. But the filenames in the plots dir are in the form *dcppA-hindcast_s1964-r1i1p1f1_tas_1964-1974.png while one would expect *dcppA-hindcast_s1964-r1i1p1f1_tas_1965-1970.png

@zklaus
Copy link

zklaus commented May 3, 2021

Sure, there are many ways to specify these. The point is neither should we manipulate start and end because of sub-experiment, nor sub-experiment because of start and end.

@sloosvel
Copy link
Contributor Author

So which changes in this pull request do you not approve of?

doc/recipe/overview.rst Outdated Show resolved Hide resolved
esmvalcore/_data_finder.py Outdated Show resolved Hide resolved
esmvalcore/_data_finder.py Outdated Show resolved Hide resolved
esmvalcore/_recipe.py Outdated Show resolved Hide resolved
esmvalcore/_recipe_checks.py Outdated Show resolved Hide resolved
@zklaus
Copy link

zklaus commented May 10, 2021

So which changes in this pull request do you not approve of?

In the review above I tried to mark all the places that this touches. All of those remarks really are about the same issue.

@sloosvel
Copy link
Contributor Author

Good to know! I'll try to see what can be done about them.

@sloosvel
Copy link
Contributor Author

If the new changes are of your liking, I will add the tests and everything else that's missing.

@zklaus
Copy link

zklaus commented May 11, 2021

There is one comment from the previous review left to address.

Now, this PR seems to introduce new functionality in terms of all_years. That appears to be completely unrelated to DCPP, so I suggest putting it into a different PR. That way we will be able to wrap this one up more quickly and the functionality (that will be appreciated by others for other experiments!) will get better visibility.

@sloosvel
Copy link
Contributor Author

That appears to be completely unrelated to DCPP,

The DCPP functionality that we need at the department requires it. This pull request without it does not help much our use case.

Would it be that bad to include it as a first step, and then refine it for other experiments in another pull request?

@zklaus
Copy link

zklaus commented May 11, 2021

I personally think that the changes will be available quicker if split up into two PRs because it makes the review easier and the changes more self-contained. If you really prefer to have it together we can continue here; in that case please add documentation of the new functionality in a place where it can be found. I think the functionality would already work for other experiments, no?

@sloosvel sloosvel mentioned this pull request May 11, 2021
10 tasks
esmvalcore/_data_finder.py Outdated Show resolved Hide resolved
doc/recipe/overview.rst Outdated Show resolved Hide resolved
@zklaus
Copy link

zklaus commented May 12, 2021

Ok to merge for me. Nice work!

@sloosvel
Copy link
Contributor Author

Great! @ESMValGroup/esmvaltool-coreteam anyone with the time to merge it?

@valeriupredoi valeriupredoi merged commit ae5c30d into master May 13, 2021
@valeriupredoi valeriupredoi deleted the dev_dcpp_startdates branch May 13, 2021 11:27
@valeriupredoi
Copy link
Contributor

good stuff @sloosvel 🍺

senesis pushed a commit that referenced this pull request Jun 7, 2021
* First attempte

* Do not require start and end years, add them later

* Correct condition

* Avoid key error in fx variables

* Consider two possible paths

* Fix function name

* Fix variable name

* Avoid duplicates in filename

* Add test for startdate expansion

* Add test for the replace tags method

* Rename tag

* Add documentation

* Allow to load subexps per timerange or as a whole

* Fix condition

* Remove 'all_years' functionality

* Fix conditions

* Fix flake

* Remove whitespace

Co-authored-by: Javier Vegas-Regidor <javier.vegas@bsc.es>
zklaus pushed a commit that referenced this pull request Jun 11, 2021
* Add basic support for variable mappings

* Add first era5 mapping

* Find files for CMIP6 DCPP startdates (#771)

* First attempte

* Do not require start and end years, add them later

* Correct condition

* Avoid key error in fx variables

* Consider two possible paths

* Fix function name

* Fix variable name

* Avoid duplicates in filename

* Add test for startdate expansion

* Add test for the replace tags method

* Rename tag

* Add documentation

* Allow to load subexps per timerange or as a whole

* Fix condition

* Remove 'all_years' functionality

* Fix conditions

* Fix flake

* Remove whitespace

Co-authored-by: Javier Vegas-Regidor <javier.vegas@bsc.es>

* Skip regridding if the target grid is almost identical to the source grid (#507)

Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl>
Co-authored-by: Stef Smeets <s.smeets@esciencecenter.nl>

* Fixes for sos and siconc of BCC models (#1090)

* sos and siconc fixed

* tests added

* test fixed

* fix flake8

* fix flake8

* fix codacy issue

* Pin cf-units and fix tests (cf-units>=2.1.5) (#1140)

* pin cf-units

* pin cf-units

* fix test

* fix test

* Handle IPSL-CM6 (the feature won't actually work without #1124)

* class Huss inherits from cass Tas. Also : Fix codacy diags.

* Replace os.system() by subprocess.run()

* Fix flake8 diags

* var_mapping -> extra_facets

* Rename _config/variable_details to _config/extra_facets

* Fix doc re. lack of 'output_file as a dict', and choice of native6

* Fix codacy diags in ipsl_cm6.py

* Use project IPSLCM to handle IPSL-CM6

* Implement changes according to Bouwe's review, 2021/06/07 (except unit tests)

* Add unit tests for _fixes/ipslcm/ipsl_cm6.py

* delete esmvalcore/cmor/_fixes/native6/ipsl_cm6.py

* Delete old file esmvalcore/_config/extra_facets/native6-ipsl-cm6-mappings.yml

* Restore main versions for _recipe.py and cmor_fixes/fix.py

* Restore main version for _recipe.py

* Delete extraneous era5-mappings.yml

* Avoid using mapping_key when calling fix.get_cube_from_list()

* Empty change in fix.py for forcing codacy to re-scan it

* Polish doc

* Polish doc again

* Again...

* and again ...

* Fix typo in comment

* Fixes according to @zklaus review

* Reduce formatting changes

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/quickstart/find_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/quickstart/find_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Minor formatting improvements

* Organize mapping file in each realm in two sections (CMIP6 and IPSL)

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>
Co-authored-by: sloosvel <45196700+sloosvel@users.noreply.github.com>
Co-authored-by: Javier Vegas-Regidor <javier.vegas@bsc.es>
Co-authored-by: Benjamin Müller <b.mueller@iggf.geo.uni-muenchen.de>
Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl>
Co-authored-by: Stef Smeets <s.smeets@esciencecenter.nl>
Co-authored-by: Rémi Kazeroni <70641264+remi-kazeroni@users.noreply.github.com>
Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmor Related to the CMOR standard ISENES
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dealing with startdates in filenames
5 participants