-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent variable naming #30
Consistent variable naming #30
Conversation
There is one potential issue with the current implementation: If there there exists in a dataset a variable with the "correct" variable (for example Also, when I create a combined DataArray (so we can analyse/plot multiple variables at once which have for example |
@annalea-albright and @JuleRadtke it would be great to have your thoughts on this too, particularly: https://github.com/eurec4a/eurec4a-environment/pull/30/files#diff-ba464340b171c9b9665c96a45aa313caR10 Does this make sense to you? |
This looks good. It seems to implement the initial steps of the calculate function I was working on but with better error messages/handling and the nice feature of concatenating multiple variables with the same standard name. I would get this merged in soon to get into the habit of using this for new functions. There's a couple of things I would change before the merge:
After merge I would still like to get the |
eurec4a_environment/nomenclature.py
Outdated
TEMPERATURE: "air_temperature", | ||
ALTITUDE: "geopotential_height", | ||
RELATIVE_HUMIDITY: "relative_humidity", | ||
POTENTIAL_TEMPERATURE: "potential_temperature", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CF standard is actually "air_potential_temperature". I think this is an inconsistency with the JOANNE dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the correction. I'll update it for the next JOANNE version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this @LSaffin. I'll check this and make sure we use the next JOANNE version as soon as it's ready. Maybe you could give me a ping on mattermost @Geet-George and then I'll update the intake catalog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll update you on it. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is wonderful!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great implementation, @leifdenby :)
import xarray as xr | ||
|
||
|
||
TEMPERATURE = "T" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TEMPERATURE = "T" | |
TEMPERATURE = "ta" |
This is just to be consistent with the latest variable names in JOANNE - most variable names are finalised, but some are still being discussed.. :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem :)
|
||
|
||
TEMPERATURE = "T" | ||
ALTITUDE = "height" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ALTITUDE = "height" | |
ALTITUDE = "alt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll update these. Should I update the JOANNE dataset in the intake catalog already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did these get updated?
eurec4a_environment/nomenclature.py
Outdated
var_dataarrays = list(matching_dataarrays.values()) | ||
|
||
# TODO: should we check units here too? | ||
da_combined = xr.concat(var_dataarrays, dim="var_name") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is well thought of. Good implementation! However, all functions would have to be written to deal with this, i.e. multiple variables. e.g. estimating theta would require an array each of pressure and temperature. If get_field
returns two temperature arrays and one pressure array, the function should know what to do... Ideally, return two theta arrays. Did I understand that correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks 😄 Yes, the idea would be that what is returned is always temperature for example, but it might be multiple temperature profiles. I've elaborated a bit on this idea in my comment below. Hope that makes sense...
Thanks for the feedback 😄
Good idea @LSaffin . I'll do that
The way I am thinking about it is that all the operations work on vertical columns, i.e. they're a reduction bit like The extra dimensions could be for example |
This sounds great @LSaffin! Yes, this very much does half the work. I was thinking we'd have a dictionary somewhere (not sure where yet, do you have an idea?) with a mapping like so: DERIVED_FIELDS = dict(
z_LCL="variables.boundary_layer.lcl.find_LCL_Bolton",
...
) Together with the code you've written this would provide the lookup for the module and function name that is needed to calculate a specific variable. We can then update this dictionary over time to provide more variables that could be used in for example the plotting functions eurec4a_environment.plot.profile.profile_plot_1d(ds, variable="qt", height_levels=["z_LCL",...]) I'm realising we could even get rid of the |
I see what you mean, this will be good for any functions calculating variables, they can and should try to be this flexible. I'm not sure how you would want to do it for plotting functions though. A few options I can think of:
I guess 2 sounds best but may be a bit excessive for more detailed plotting functions. Maybe don't hold up the pull request on this but keep it as an open issue. |
I would put the dictionary in the same place as the calculating function. It can be useful to import the dictionary and see the list of variables that can be calculated. from eurec4a_environment.NAME_OF_MODULE import NAME_OF_FUNCTION, DERIVED_FIELDS
print(DERIVED_FIELDS.keys()) Names to be decided. I don't think my previous naming is optimal.
Probably yes. We can look back into this once this pull request is resolved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, very nice!!
Thanks for the feedback again all. I'll work in the suggestions you made and the merge this 🚀 |
A common pattern when accessing a field is to then convert this field to be in the required units. This commit adds functionality to use `cf_units` to carry out this conversion automatically when the user requests a valid set of units.
All test and existing variable calculations changed to use module-wide nomenclature
It appears that `cfunits` is easier to install via pip (it doesn't break on the CI system), so we use that instead.
In implementing this I realised that a common pattern in our functions it to ensure the fields we are operating on are in the correct units and so I added functionality to use the cf_units module to convert a field to the desired units. This makes it possible to do for example the following: da_temperature = nom.get_field(ds=ds, field_name=temperature, units="K") instead of if ds[temperature].units == "C":
da_temperature = ds[temperature] + 273.15
da_temperature.attrs["units"] = "K"
else:
da_temperature = ds[temperature] See for example b49fc40#diff-b6a50438da9d9fc4f5de22ebda3e229fL31 If you're all happy with this pull-request I'd like to merge it in. |
Much much better! Thanks a lot :) |
I think it would be useful to have a separate pull request for the units feature (and the integration) because I can see a few things I would change for that. I think I would be happy merging the pull request following the commit 4060ef6 assuming it passes the tests. |
eurec4a_environment/nomenclature.py
Outdated
matching_dataarrays = {} | ||
vars_and_coords = list(ds.data_vars) + list(ds.coords) | ||
for v in vars_and_coords: | ||
print(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want the print statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks :)
How would you feel about merging this pull-request and starting a new issue to fix the bits you feel need improving? That would just save a me a few hours work undoing the changes I did to the tests and functions to make this work 😄 |
eurec4a_environment/nomenclature.py
Outdated
return da | ||
|
||
if not HAS_UDUNITS2: | ||
raise Exception( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to have a more specific exception here
Fair enough. I've approved the changes (although it doesn't look like it made a difference). The changes I would make are
I can make these changes once your pull request is merged if they sound OK to you. Other questions I had:
|
This pull-request adds functionality to address #26