Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish Hazard Datasets calculated by ZAMG as Open Data #9

Closed
p-a-s-c-a-l opened this issue Nov 23, 2018 · 19 comments
Closed

Publish Hazard Datasets calculated by ZAMG as Open Data #9

p-a-s-c-a-l opened this issue Nov 23, 2018 · 19 comments

Comments

@p-a-s-c-a-l
Copy link
Member

p-a-s-c-a-l commented Nov 23, 2018

According to the status presentation, ZAMG calculates datasets for

  • 25 Indices
  • 16 GCM/RCM climate model combinations (daily data) from EURO-CORDEX
  • 4 time periods (1971-2000, 2011-2040, 2041-2070, 2071-2100)
  • 3 RCP scenarios (2.6, 4.5, 8.5)

= 3325 unique datasets. Btw, why 3325 datasets not 4800 (25x16x4x3)?

An example for Heatwave Duration Hazard NetCDF file can be found in this issue.

Note: This data has to be "rasterised" to GeoTIF 500km grid (example for the same dataset here) and then the local effects are taken into account to generate derived datasets. The complete process chain will eventually documented here. So in the end, we would possibly calculate 3 x 3325 datasets that have to be published as open data according to H2020 Open Access Guidelines. However, it is up to the @clarity-h2020/data-processing-team and @clarity-h2020/mathematical-models-implementation-team to discuss and decide, if really we need that amount of derived datasets. But this is better addressed in this issue and other HC, HC-LE and EE related questions I'm going to ask soon.

@p-a-s-c-a-l
Copy link
Member Author

p-a-s-c-a-l commented Nov 23, 2018

For the Data Management Plan it is at the moment only relevant to consider how the datasets can be made publicly available for re-use by other interested parties (this is also a dissemination issue). Here we concentrate first on releasing the original datasets produced ZAMG as open data and address derived datasets (those considering the local effects) in a separate issue.

When we talk about 3325 datasets, the publication process (Example) must be automated:

  1. deposit the dataset and associated meta-data in a research data repository, e.g. Zendo unless ZAMG wants release it on data.ccca.ac.at
  2. register the dataset meta-data (including a link to the actual data resource stored in Zenodo) in our CKAN instance ('living' Data Management Plan).

Both Zenodo and CKAN offer APIs, so we can develop some simple scripts that automates this process. Theoretically it would also be possible to configure CKAN to automatically harvest the meta-data from Zenodo.

Questions to @clarity-h2020/science-support-team

  1. Why 3325 datasets and not 4800 (25x16x4x3)?
  2. Why 16 different GCM/RCP combinations? Do we really need to consider all of them impact calculation as discussed in this issue or do we select one mean/ensemble scenario?

@claudiahahn
Copy link

The original data sets produced by ZAMG will be stored on a server of the CCCA and, after all licenses have been checked, will be released on data.ccca.ac.at.
Question 1: Robert listed 3325 and not 4800 data sets because not for all GCM/RCM combinations the RCP 2.6 scenario was available
Question 2: There is no need to consider all GCM/RCM combinations. We will provide the ensemble mean (and the max/ min or some percentiles to assess uncertainty). Thus, we have for each index one ensemble mean value for each time period (4) and each RCP scenario (3). That makes 12 ensemble mean values for each index plus e.g. the respective min/max.
All CLARITY partners can work with that data, but before making data that are based on the EURO-CORDEX data, publicly available the licenses have to be checked. That means the institutions that provide the EURO-CORDEX data need to be contacted.

@DenoBeno

This comment has been minimized.

@p-a-s-c-a-l

This comment has been minimized.

@DenoBeno
Copy link

In todays telco, a decision was made that Louis (Meteogrid) will draft a letter for requesting the use of data from the owners. This is needed mainly for EURO-CORDEX data, as far as I understand.

@p-a-s-c-a-l
Copy link
Member Author

p-a-s-c-a-l commented Nov 27, 2018

The original data sets produced by ZAMG will be stored on a server of the CCCA and, after all licenses have been checked, will be released on data.ccca.ac.at.

O.K. In practical terms that means that we

  • don't need to upload these datasets to Zenodo as they can be downloaded from data.ccca.ac.at. This obsoletes also Data Management Example: Heatwave Duration Hazard
  • don't need to register these datasets in CLARTIY's CKAN since meta-data can be viewed in data.ccca.ac.at's CKAN

In Data Management Plan we can then directly refer to data.ccca.ac.at. Perfect.
@claudiahahn Assuming that we are allowed to publish the data (see #9 (comment)), when will it be made available on data.ccca.ac.at? D7.9 Data Management Plan v2 is due by end of January 2018.

Where and how to publish (in terms of Data Management, not CSIS WMS/WCS publication) derived hazard datasets (+local effects) is another story and has to be discussed with @clarity-h2020/data-processing-team

@claudiahahn

This comment has been minimized.

@p-a-s-c-a-l
Copy link
Member Author

As soon as the bias correction is finished, we can calculate the indices using the bias corrected EURO-CORDEX data and make it available.

OK, so the implications are

  • In the D7.9 Data Management Plan v2 we will just announce that indices using bias corrected EURO-CORDEX data will be made available as open data. In D7.9 Data Management Plan v3 (end of 2019) we can then provide the links to the actual data @data.ccca.ac.at. Fine.
  • @clarity-h2020/data-processing-team must be aware that the data that is now made available internally (uploaded to sFTP) contains initial/draft hazard indices and that they have to be re-processed when the bias corrected indices have been calculated.

@claudiahahn

This comment has been minimized.

@p-a-s-c-a-l

This comment has been minimized.

@p-a-s-c-a-l
Copy link
Member Author

Any progress to be reported here?

@claudiahahn
Copy link

The data sets are not yet published on CCCA.

Regarding the license issue: According to the following list, Lena has directed us to:
http://is-enes-data.github.io/CORDEX_RCMs_info.html,
the use of the EURO-CORDEX data we use to calculate the climate indices is not restricted. Therefore, the Climate Indices can be made publicly available without restrictions.

@DenoBeno

This comment has been minimized.

@p-a-s-c-a-l

This comment has been minimized.

@p-a-s-c-a-l
Copy link
Member Author

This isn't valid any more, right?

The original data sets produced by ZAMG will be stored on a server of the CCCA and, after all licenses have been checked, will be released on data.ccca.ac.at.

All datasets will be made available on Zendo?

@RobAndGo
Copy link

RobAndGo commented Feb 7, 2020

Yes, this is correct. Initially when this statement was made, I was not aware of Zenodo. Then when I made the comparison of uploading the data, I found it was much easier to upload it via Zenodo than on CCCA.

@p-a-s-c-a-l
Copy link
Member Author

All datasets are now available on Zenodo, right? So can close this issue.

@RobAndGo
Copy link

RobAndGo commented May 4, 2020

Yes

@p-a-s-c-a-l
Copy link
Member Author

Thanks, Robert!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants