-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FB] [PI-3474] (WIP) Ensure flags load/save without units #3613
Conversation
@@ -1689,6 +1689,10 @@ fc_extras | |||
if np.issubdtype(cf_var.dtype, np.str_): | |||
attr_units = cf_units._NO_UNIT_STRING | |||
|
|||
if "flag_values" in cf_var._nc_attrs or "flag_masks" in cf_var._nc_attrs \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be using a private property of CFVariable here if we can avoid it.
I've just spent much too long trying to understand exactly what CFVariable does + how,
and I would say, please let's not reply on that implementation here !
I think I would use any(hasattr(cf_var, name) for name in (...))
.
It's true, this will be less efficient because hasattr depends entirely on getattr
On the other hand, it will only fetch them once per variable, as the CFVariable implementation does ensure that.
Given that netcdf files don't have the proliferation of small things that field-based formats do, I think it will be ok for performance -- although we could test that ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it looks like there is some unusual behaviour associated with getting atrributes from cf_var
where accessing an attribute will cause it to not be picked up as an attribute in the final object. I think this is due to this piece of code
iris/lib/iris/fileformats/cf.py
Lines 175 to 183 in 715d019
def __getattr__(self, name): | |
# Accessing netCDF attributes is surprisingly slow. Since | |
# they're often read repeatedly, caching the values makes things | |
# quite a bit faster. | |
if name in self._nc_attrs: | |
self._cf_attrs.add(name) | |
value = getattr(self.cf_data, name) | |
setattr(self, name, value) | |
return value |
It seems like in the current implementation, "attributes" contains everything which has not already been accessed in this way. Trying to access these attributes prematurely causes them to be ignored later on.
I believe this is why I chose to take attributes from cf_var._nc_attrs
rather than directly from cf_var
. I tested your suggestion and it looks like this would cause attributes to load incorrectly.
This does mean that we are bypassing a mechanism which is specifically designed to improve performance. I have a feeling that the ideal solution may have to involve separating the mechanism for accessing cf_var
attributes from the mechanism for chosing which cf_var
attributes to ignore when setting iris attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephenworsley I have a feeling that the ideal solution may have to involve separating the mechanism for accessing cf_var attributes from the mechanism for chosing which cf_var attributes to ignore when setting iris attributes.
You're dead right !
I do think that the overlapping schemes in CFVariable generate a whole heap of extra confusion.
( I spent quite some time trying to work out why the 'ignored' and 'unused' ideas work in just the the way that they do. )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having said this, checking the attributes of cf_var.cf_data
seems to work while relying on more public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I'm really sorry for making a duff suggestion !
A simple "clean" solution would be to test hasattr(cf_var.cf_data)
instead of hasattr(cf_var)
. But, like you, I'm still a bit concerned about bypassing the speedup mechanism (even though I suspect that at least some of it is probably unnecessary).
Another really simply fix for my original objection would be to retain your existing code solution, but just make cf_var._nc_attrs
public -- by renaming, removing the underscore.
How would you feel about that ?
( Note: there aren't many existing uses of this, and they are all within cf.py )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like either of these could work. To be honest, I don't have a strong enough picture of how cf_var
is structured to have a clear idea of which feels more correct. In either case, I suspect both approaches would require changing the a lot of the tests currently using a mocked cf_var
.
After long, long thought comparing this to #3712, I think I do prefer this approach. The real cause of the problems here lies in the confused division-of-labour between I don't think we can seek to fix that here + now, but must live with it. I also think that the whole business of adjusting attributes during the cube-load in netcdf.py is probably the nastiest bit of netcdf.py, and the worst of the division-of-labour failure. |
@stephenworsley WARNING: I see we are losing Travis testing again. I wish I understood why this happens ! |
Yes, I find that 3 are failing in
|
Ok, this is looking good ! So, I think that now happily "proves" the correctness of the essential change. I think the correct approach is just to merge this, leaving those changes in, and take care of them in the merge process. |
* change dependencies : NOTE these changes need removing when remerging to master
… launch_ancils * 'launch_ancils' of https://github.com/SciTools/iris: Resolve conflicts with requirements/test.txt to update the imagehash pin to imagehash>=4.0 (SciTools#3735) PI:3358 (WIP) Ensure flags load/save without units (SciTools#3613) PI-3473 Rename "engine.provides" (SciTools#3590) PI-3473: Whatsnews relating to ancillary load + save. (SciTools#3557) PI-3473: Netcdf loading ancillary variables (SciTools#3556) � Conflicts: � lib/iris/fileformats/netcdf.py � lib/iris/tests/unit/fileformats/netcdf/test__load_aux_factory.py � requirements/core.txt
Addresses #3474. Anything loaded from netCDF with a flag attribute (some attribute wich is either
flag_values
,flag_masks
orflag_meanings
) is explicitly given a unit of"no_unit"
in iris. Anything in iris with a unit of"no_unit"
is now saved to netCDF without units. This also addresses #3394.Also aligns with the improvements to the use of units discussed in #3358