-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for a variable aliasing scheme (for use in model development) #1083
Comments
@senesis cheers for the issue! I would encourage you to provide us with a concrete used case - since this issue could be faceted in that a variable could be mapped/aliased to a CMOR variable, or not, or maybe: e.g a variable that is produced by some OBS model is the exact equivalent of |
OK. The use case, for now, is : model IPSL-CM6 has two native output formats, both NetCDF-based. Format 'TS' is composed of single-variable files, named e.g. :
which includes a NetCDF variable 't2m', which is actually the exact equivalent of 'tas' variable So, it matches your first case :
There may be some issues with CMOR conformance, but that I intend to address either in a model-specific fix (e.g. here adding a height2m scalar coordinate) , or through built-in CMOR fixes (e.g. renaming the variable) I will certainly also have some variable derivation issues to address (and I am not sure of how to best address re-constructing a CMOR standard variable by combining non-standard variables), but this will be another story. |
If you work with the Or you can even create a new project in the config-developer.yml (which is what we do to work with our model data for monitoring purposes). |
Thanks. I was able to create a new project quite successfuly for 'my' model output But my goal goes beyond finding data by indicating in the recipe a project specific variable attribute such as 'era5_name'; I want to be able to apply all existing recipes (which together form the actual treasure of ESMValTool), to a mix of CMIP data and data which filenames are formed using non-standard variable name Said otherwise : for any recipe requesting 'tas', the _data_finder shoud, for a dataset of my newly defined project, translate 'tas' to 't2m' for finding the file named "CM61-LR-hist-03.1950_18500101_18591231_1M_t2m.nc" I see no other way than the code change described above |
And _find_input_files would just have to be slightly changed :
|
The idea we had on how to achieve this is described very shortly in #309, i.e. have a yaml file (path configurable per project in config-developer.yml) containing a mapping from CMIP6 variables to extra key-value pairs. Those extra key value pairs should then be added to the These extra keys could then be used to find the data using the directory structure defined in config-developer.yml without any modifications needed to the functions for finding input data. |
On a related note: having a separate 'project' per supported model would probably be OK as these are not so many, but we also had the idea of making the DRS definition in config-developer.yml a bit more flexible #970 (comment), because we would not like to have a separate project for every supported observational/reanalysis dataset as that would just be too many. |
That sounds great. Can we safely assume that such a keys can be either project-specific keys (such as 'label_for_variable_in_filename') or standard keys (such as 'dataset', that would drive the choice of a fix module) Also : there is no description there of the specific 'recipe' mechanics that would allow to apply a python code for deriving variables. And I do not see how such a code would be provided with necessary input variables, while the 'standard' derived variable scheme allows nicely for that |
The #970 (comment) introduces the concept of 'center' , which is new for ESMValTool. And I think it is worth thinking twice at what it would mean or drive. It should so drive the choice of
|
Sorry for the confusion, I am not used to the ESMValTool vocabulary and it seems I chose the word poorly. What I was calling "center" (for "datacenter") is not a new concept, as far as I know. I'm not sure how are they usually called, but it appears here with the name "key machines": https://docs.esmvaltool.org/projects/esmvalcore/en/latest/quickstart/configure.html#developer-configuration-file I will edit the comment to avoid further confusion |
Hi, @jvegasbsc , @rswamina , @mattiarighi , @bouweandela , @bsolino
I am working with IPSL, in the context of IS-ENES3, for testing the feasibility of ESMValTool use in model development. I was impressed both by the clear design of the code and the extensive documentation. Congratulations !
For the goal above, one need more flexibility in the data_finder, for replacing {short_name} by a variable name which is let to user's or configurer's choice. This is useful when e.g. the model outputs are quite consistent with some CMIP project tables set, and the departures can be addressed by the fix_metadata and fix_data features.
This case is quite different from the 'variable_alt_names' scheme described in this other issue, and which, if I understood well, is devoted to the case when the same physical variables has different names in different 'standard' projects (as e.g. 'si' in CMIP == 'siconc' in CMIP6). The difference is that we would like, here, to avoid creating a tables set that would be specific to the model, but rather use an existing tables set
Digging in the code I found that, in function _find_input_files, variable['short_name'] is changed before calling _find_input_dirs and _get_filenames, this in order to use an alternate variable name in file naming. So I tested to set it using a short function, which queries the config for a new project entry named 'aliases' (see code below)
It works, and allows to further explore the overall goal.
Could I go forward that way toward a PR ?
And, by the way, where should I include commits that only deal with improving esmvalcore code docstrings, and logged messages text ?
The text was updated successfully, but these errors were encountered: