Consistent variable naming #30

leifdenby · 2020-08-11T18:12:24Z

This pull-request adds functionality to address #26

leifdenby · 2020-08-11T18:16:03Z

There is one potential issue with the current implementation: If there there exists in a dataset a variable with the "correct" variable (for example T) for temperature that will always be picked, even if there are multiple variables in the dataset which have the air_temperature standard_name attribute. Are we ok with this?

Also, when I create a combined DataArray (so we can analyse/plot multiple variables at once which have for example air_temperature standard_name) I don't currently check the units. I was going to just check that the "units" string is the same for all variables, is that ok with everyone? I don't feel like this would be used alot, but sometimes you might want to make profile plots with multiple sources for temperature say...

leifdenby · 2020-08-12T08:53:08Z

@annalea-albright and @JuleRadtke it would be great to have your thoughts on this too, particularly: https://github.com/eurec4a/eurec4a-environment/pull/30/files#diff-ba464340b171c9b9665c96a45aa313caR10

Does this make sense to you?

leosaffin · 2020-08-12T10:24:02Z

This looks good. It seems to implement the initial steps of the calculate function I was working on but with better error messages/handling and the nice feature of concatenating multiple variables with the same standard name. I would get this merged in soon to get into the habit of using this for new functions. There's a couple of things I would change before the merge:

I think the function get_field is generic enough that you should put it into the top-level init.py file. It would be good to be able to do from eurec4a_environment import get_field
It would be good to put in an option to suppress the concatenating of multiple fields with a default of False, so when you get to the the len(matching_arrays) being more than 1 section you would just raise an error message. This way, if you know you're in a function that requires only one variable returned you can set that flag.

After merge I would still like to get the calculate function involved here. It could then work by initially calling get_field with a try/except on FieldMissingException.

leosaffin · 2020-08-12T10:25:10Z

eurec4a_environment/nomenclature.py

+    TEMPERATURE: "air_temperature",
+    ALTITUDE: "geopotential_height",
+    RELATIVE_HUMIDITY: "relative_humidity",
+    POTENTIAL_TEMPERATURE: "potential_temperature",


The CF standard is actually "air_potential_temperature". I think this is an inconsistency with the JOANNE dataset

Thanks for the correction. I'll update it for the next JOANNE version.

Thanks for catching this @LSaffin. I'll check this and make sure we use the next JOANNE version as soon as it's ready. Maybe you could give me a ping on mattermost @Geet-George and then I'll update the intake catalog?

Yes, I'll update you on it. 👍

annalea-albright

Wow, this is wonderful!

Geet-George

Great implementation, @leifdenby :)

Geet-George · 2020-08-12T10:38:15Z

eurec4a_environment/nomenclature.py

+import xarray as xr
+
+
+TEMPERATURE = "T"


Suggested change

TEMPERATURE = "T"

TEMPERATURE = "ta"

This is just to be consistent with the latest variable names in JOANNE - most variable names are finalised, but some are still being discussed.. :/

No problem :)

Geet-George · 2020-08-12T10:38:27Z

eurec4a_environment/nomenclature.py

+
+
+TEMPERATURE = "T"
+ALTITUDE = "height"


Suggested change

ALTITUDE = "height"

ALTITUDE = "alt"

Thanks, I'll update these. Should I update the JOANNE dataset in the intake catalog already?

Did these get updated?

Geet-George · 2020-08-12T10:50:55Z

eurec4a_environment/nomenclature.py

+                var_dataarrays = list(matching_dataarrays.values())
+
+                # TODO: should we check units here too?
+                da_combined = xr.concat(var_dataarrays, dim="var_name")


This is well thought of. Good implementation! However, all functions would have to be written to deal with this, i.e. multiple variables. e.g. estimating theta would require an array each of pressure and temperature. If get_field returns two temperature arrays and one pressure array, the function should know what to do... Ideally, return two theta arrays. Did I understand that correctly?

Thanks 😄 Yes, the idea would be that what is returned is always temperature for example, but it might be multiple temperature profiles. I've elaborated a bit on this idea in my comment below. Hope that makes sense...

leifdenby · 2020-08-12T11:21:57Z

Thanks for the feedback 😄

1. I think the function `get_field` is generic enough that you should put it into the top-level **init**.py file. 
   It would be good to be able to do `from eurec4a_environment import get_field`

Good idea @LSaffin . I'll do that

2. It would be good to put in an option to suppress the concatenating of multiple fields with a default of `False`, so 
   when you get to the the `len(matching_arrays)` being more than 1 section you would just raise an error message. 
   This way, if you know you're in a function that requires only one variable returned you can set that flag.

The way I am thinking about it is that all the operations work on vertical columns, i.e. they're a reduction bit like ds.mean(dim="altitude") and so they should be able to take DataArrays with any number of dimensions and just reduce the "height" dimension.

The extra dimensions could be for example time or sonde_number as we've had before, but they could also just be var_name (representing a number of DataArrays that were previously separate, but have now been merged into a single DataArray with a new var_name dimension). We could simply say that we don't want to merge when they are multiple DataArrays with the matching standard_name, but I do think that the functions we implement have to work independently of the other dimensions. This will also mean that with for example LES data (which could have the dimensions (x, y, z)) we could apply the functions by just setting the correct standard name on z. What do you think @LSaffin? This relates to your question too @Geet-George

leifdenby · 2020-08-12T11:29:23Z

After merge I would still like to get the calculate function involved here. It could then work by initially calling get_field with a try/except on FieldMissingException.

This sounds great @LSaffin! Yes, this very much does half the work. I was thinking we'd have a dictionary somewhere (not sure where yet, do you have an idea?) with a mapping like so:

DERIVED_FIELDS = dict(
    z_LCL="variables.boundary_layer.lcl.find_LCL_Bolton",
    ...
)

Together with the code you've written this would provide the lookup for the module and function name that is needed to calculate a specific variable. We can then update this dictionary over time to provide more variables that could be used in for example the plotting functions

eurec4a_environment.plot.profile.profile_plot_1d(ds, variable="qt", height_levels=["z_LCL",...])

I'm realising we could even get rid of the z_ part if we're passing in the argument as height_levels, but I'm maybe getting a bit ahead of myself here...

leosaffin · 2020-08-12T12:38:22Z

The way I am thinking about it is that all the operations work on vertical columns, i.e. they're a reduction bit like ds.mean(dim="altitude") and so they should be able to take DataArrays with any number of dimensions and just reduce the "height" dimension.

The extra dimensions could be for example time or sonde_number as we've had before, but they could also just be var_name (representing a number of DataArrays that were previously separate, but have now been merged into a single DataArray with a new var_name dimension). We could simply say that we don't want to merge when they are multiple DataArrays with the matching standard_name, but I do think that the functions we implement have to work independently of the other dimensions. This will also mean that with for example LES data (which could have the dimensions (x, y, z)) we could apply the functions by just setting the correct standard name on z. What do you think @LSaffin? This relates to your question too @Geet-George

I see what you mean, this will be good for any functions calculating variables, they can and should try to be this flexible. I'm not sure how you would want to do it for plotting functions though. A few options I can think of:

Allow them to pass a flag that raises an error in the get_field function
You could also make the plot for every extra element in the new dimension
Leave as is and let the matplotlib function do the error checking

I guess 2 sounds best but may be a bit excessive for more detailed plotting functions. Maybe don't hold up the pull request on this but keep it as an open issue.

leosaffin · 2020-08-12T12:49:02Z

I was thinking we'd have a dictionary somewhere (not sure where yet, do you have an idea?) with a mapping like so:
DERIVED_FIELDS = dict(
    z_LCL="variables.boundary_layer.lcl.find_LCL_Bolton",

I would put the dictionary in the same place as the calculating function. It can be useful to import the dictionary and see the list of variables that can be calculated.

from eurec4a_environment.NAME_OF_MODULE import NAME_OF_FUNCTION, DERIVED_FIELDS
print(DERIVED_FIELDS.keys())

Names to be decided. I don't think my previous naming is optimal.

I'm realising we could even get rid of the z_ part if we're passing in the argument as height_levels, but I'm maybe getting a bit ahead of myself here...

Probably yes. We can look back into this once this pull request is resolved.

JuleRadtke

Great, very nice!!

leifdenby · 2020-08-14T08:45:04Z

Thanks for the feedback again all. I'll work in the suggestions you made and the merge this 🚀

A common pattern when accessing a field is to then convert this field to be in the required units. This commit adds functionality to use `cf_units` to carry out this conversion automatically when the user requests a valid set of units.

All test and existing variable calculations changed to use module-wide nomenclature

It appears that `cfunits` is easier to install via pip (it doesn't break on the CI system), so we use that instead.

leifdenby · 2020-08-18T13:16:32Z

In implementing this I realised that a common pattern in our functions it to ensure the fields we are operating on are in the correct units and so I added functionality to use the cf_units module to convert a field to the desired units. This makes it possible to do for example the following:

da_temperature = nom.get_field(ds=ds, field_name=temperature, units="K")

instead of

if ds[temperature].units == "C":
    da_temperature = ds[temperature] + 273.15
    da_temperature.attrs["units"] = "K"
else:
    da_temperature = ds[temperature]

See for example b49fc40#diff-b6a50438da9d9fc4f5de22ebda3e229fL31

If you're all happy with this pull-request I'd like to merge it in.

annalea-albright · 2020-08-18T13:53:47Z

Much much better! Thanks a lot :)
And sorry I’ve gotten delayed on adding functions, but it’s on my to-do list!

leosaffin · 2020-08-18T14:17:16Z

I think it would be useful to have a separate pull request for the units feature (and the integration) because I can see a few things I would change for that. I think I would be happy merging the pull request following the commit 4060ef6 assuming it passes the tests.

leosaffin · 2020-08-18T14:19:05Z

eurec4a_environment/nomenclature.py

+            matching_dataarrays = {}
+            vars_and_coords = list(ds.data_vars) + list(ds.coords)
+            for v in vars_and_coords:
+                print(v)


I don't think we want the print statement

Good catch, thanks :)

leifdenby · 2020-08-18T21:19:55Z

I think it would be useful to have a separate pull request for the units feature (and the integration) because I can see a few things I would change for that. I think I would be happy merging the pull request following the commit 4060ef6 assuming it passes the tests.

How would you feel about merging this pull-request and starting a new issue to fix the bits you feel need improving? That would just save a me a few hours work undoing the changes I did to the tests and functions to make this work 😄

leosaffin · 2020-08-19T08:39:29Z

eurec4a_environment/nomenclature.py

+            return da
+
+        if not HAS_UDUNITS2:
+            raise Exception(


Would be better to have a more specific exception here

leosaffin · 2020-08-19T08:51:08Z

Fair enough. I've approved the changes (although it doesn't look like it made a difference). The changes I would make are

I would put the unit conversion in a separate function and module because it doesn't cover nomenclature
When using get_field in other functions it would be better to use it from eurec4a_environment rather than eurec4a_environment.nomenclature
Considering the first two points, it would be good to change the name of the function in nomenclature (e.g. to get_field_by_name) and have the function get_field in the top level module call that function and the function for unit conversion. Then other functions are calling this top level get_field and you can add other functionality that might not be related to nomenclature to it.

I can make these changes once your pull request is merged if they sound OK to you. Other questions I had:

Did you find any important differences between cf_units and cfunits? As far as I could tell they were basically the same but cf_units was more active and better documented.
Was using iris considered at all? This already does the unit checking as well as coordinate checking and better support for CF standard names (although the CF standard naming could be a limitation).

leifdenby added 2 commits August 11, 2020 19:10

add nomenclature with tests

9424173

refactor mixed-layer-height methods to use nomenclature

ad336fe

leifdenby requested review from Geet-George and leosaffin August 11, 2020 18:12

leifdenby mentioned this pull request Aug 11, 2020

Levels #21

Open

leifdenby requested review from annalea-albright and JuleRadtke August 12, 2020 08:52

leosaffin reviewed Aug 12, 2020

View reviewed changes

annalea-albright approved these changes Aug 12, 2020

View reviewed changes

Geet-George suggested changes Aug 12, 2020

View reviewed changes

JuleRadtke approved these changes Aug 12, 2020

View reviewed changes

leifdenby added 7 commits August 17, 2020 13:37

Merge branch 'master' into consistent-variable-naming

803be5e

fixes from feedback

91036a9

fixes for black

4060ef6

Add units support to nomenclature

e189962

A common pattern when accessing a field is to then convert this field to be in the required units. This commit adds functionality to use `cf_units` to carry out this conversion automatically when the user requests a valid set of units.

Refactor to use nomenclature throughout

b49fc40

All test and existing variable calculations changed to use module-wide nomenclature

Use cfunits instead of cf_units

1c8d9c1

It appears that `cfunits` is easier to install via pip (it doesn't break on the CI system), so we use that instead.

handle missing udunits2

a434d2a

leifdenby marked this pull request as ready for review August 18, 2020 13:16

leosaffin reviewed Aug 18, 2020

View reviewed changes

leosaffin approved these changes Aug 19, 2020

View reviewed changes

leosaffin reviewed Aug 19, 2020

View reviewed changes

tweaks from feedback

60bc654

leifdenby merged commit c789425 into eurec4a:master Aug 21, 2020

leifdenby mentioned this pull request Aug 21, 2020

Improving get_field of nomenclature module #32

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistent variable naming #30

Consistent variable naming #30

leifdenby commented Aug 11, 2020

leifdenby commented Aug 11, 2020

leifdenby commented Aug 12, 2020

leosaffin commented Aug 12, 2020

leosaffin Aug 12, 2020

Geet-George Aug 12, 2020 •

edited

Loading

leifdenby Aug 12, 2020

Geet-George Aug 12, 2020

annalea-albright left a comment

Geet-George left a comment

Geet-George Aug 12, 2020

leifdenby Aug 12, 2020

Geet-George Aug 12, 2020

leifdenby Aug 12, 2020

leosaffin Aug 18, 2020

Geet-George Aug 12, 2020

leifdenby Aug 12, 2020

leifdenby commented Aug 12, 2020 •

edited

Loading

leifdenby commented Aug 12, 2020 •

edited

Loading

leosaffin commented Aug 12, 2020

leosaffin commented Aug 12, 2020 •

edited

Loading

JuleRadtke left a comment

leifdenby commented Aug 14, 2020

leifdenby commented Aug 18, 2020

annalea-albright commented Aug 18, 2020

leosaffin commented Aug 18, 2020

leosaffin Aug 18, 2020

leifdenby Aug 18, 2020

leifdenby commented Aug 18, 2020

leosaffin Aug 19, 2020

leosaffin commented Aug 19, 2020 •

edited

Loading

Consistent variable naming #30

Consistent variable naming #30

Conversation

leifdenby commented Aug 11, 2020

leifdenby commented Aug 11, 2020

leifdenby commented Aug 12, 2020

leosaffin commented Aug 12, 2020

Choose a reason for hiding this comment

Geet-George Aug 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

annalea-albright left a comment

Choose a reason for hiding this comment

Geet-George left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leifdenby commented Aug 12, 2020 • edited Loading

leifdenby commented Aug 12, 2020 • edited Loading

leosaffin commented Aug 12, 2020

leosaffin commented Aug 12, 2020 • edited Loading

JuleRadtke left a comment

Choose a reason for hiding this comment

leifdenby commented Aug 14, 2020

leifdenby commented Aug 18, 2020

annalea-albright commented Aug 18, 2020

leosaffin commented Aug 18, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leifdenby commented Aug 18, 2020

Choose a reason for hiding this comment

leosaffin commented Aug 19, 2020 • edited Loading

Geet-George Aug 12, 2020 •

edited

Loading

leifdenby commented Aug 12, 2020 •

edited

Loading

leifdenby commented Aug 12, 2020 •

edited

Loading

leosaffin commented Aug 12, 2020 •

edited

Loading

leosaffin commented Aug 19, 2020 •

edited

Loading