Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FB] [PI-3474] (WIP) Ensure flags load/save without units #3613

Merged
merged 7 commits into from
Jun 5, 2020

Conversation

stephenworsley
Copy link
Contributor

@stephenworsley stephenworsley commented Dec 19, 2019

Addresses #3474. Anything loaded from netCDF with a flag attribute (some attribute wich is either flag_values, flag_masks or flag_meanings) is explicitly given a unit of "no_unit" in iris. Anything in iris with a unit of "no_unit" is now saved to netCDF without units. This also addresses #3394.
Also aligns with the improvements to the use of units discussed in #3358

@stephenworsley stephenworsley added this to the v3.0.0 milestone Dec 20, 2019
@stephenworsley stephenworsley changed the title Ensure flags load/save without units (WIP) Ensure flags load/save without units May 19, 2020
@@ -1689,6 +1689,10 @@ fc_extras
if np.issubdtype(cf_var.dtype, np.str_):
attr_units = cf_units._NO_UNIT_STRING

if "flag_values" in cf_var._nc_attrs or "flag_masks" in cf_var._nc_attrs \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be using a private property of CFVariable here if we can avoid it.
I've just spent much too long trying to understand exactly what CFVariable does + how,
and I would say, please let's not reply on that implementation here !

I think I would use any(hasattr(cf_var, name) for name in (...)).

It's true, this will be less efficient because hasattr depends entirely on getattr
On the other hand, it will only fetch them once per variable, as the CFVariable implementation does ensure that.
Given that netcdf files don't have the proliferation of small things that field-based formats do, I think it will be ok for performance -- although we could test that ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like there is some unusual behaviour associated with getting atrributes from cf_var where accessing an attribute will cause it to not be picked up as an attribute in the final object. I think this is due to this piece of code

def __getattr__(self, name):
# Accessing netCDF attributes is surprisingly slow. Since
# they're often read repeatedly, caching the values makes things
# quite a bit faster.
if name in self._nc_attrs:
self._cf_attrs.add(name)
value = getattr(self.cf_data, name)
setattr(self, name, value)
return value

It seems like in the current implementation, "attributes" contains everything which has not already been accessed in this way. Trying to access these attributes prematurely causes them to be ignored later on.
I believe this is why I chose to take attributes from cf_var._nc_attrs rather than directly from cf_var. I tested your suggestion and it looks like this would cause attributes to load incorrectly.

This does mean that we are bypassing a mechanism which is specifically designed to improve performance. I have a feeling that the ideal solution may have to involve separating the mechanism for accessing cf_var attributes from the mechanism for chosing which cf_var attributes to ignore when setting iris attributes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenworsley I have a feeling that the ideal solution may have to involve separating the mechanism for accessing cf_var attributes from the mechanism for chosing which cf_var attributes to ignore when setting iris attributes.

You're dead right !
I do think that the overlapping schemes in CFVariable generate a whole heap of extra confusion.
( I spent quite some time trying to work out why the 'ignored' and 'unused' ideas work in just the the way that they do. )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said this, checking the attributes of cf_var.cf_data seems to work while relying on more public API.

Copy link
Member

@pp-mo pp-mo Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I'm really sorry for making a duff suggestion !
A simple "clean" solution would be to test hasattr(cf_var.cf_data) instead of hasattr(cf_var). But, like you, I'm still a bit concerned about bypassing the speedup mechanism (even though I suspect that at least some of it is probably unnecessary).

Another really simply fix for my original objection would be to retain your existing code solution, but just make cf_var._nc_attrs public -- by renaming, removing the underscore.
How would you feel about that ?

( Note: there aren't many existing uses of this, and they are all within cf.py )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like either of these could work. To be honest, I don't have a strong enough picture of how cf_var is structured to have a clear idea of which feels more correct. In either case, I suspect both approaches would require changing the a lot of the tests currently using a mocked cf_var.

@pp-mo
Copy link
Member

pp-mo commented Jun 2, 2020

After long, long thought comparing this to #3712, I think I do prefer this approach.

The real cause of the problems here lies in the confused division-of-labour between netcdf.py, cf.py and _pyke_rules/fc_rules_cf.krb.
All of those carry some of the encoding of CF concepts, unfortunately (though netcdf.py much less so).

I don't think we can seek to fix that here + now, but must live with it.
So, I think this approach is the more natural one, within the existing code context.

I also think that the whole business of adjusting attributes during the cube-load in netcdf.py is probably the nastiest bit of netcdf.py, and the worst of the division-of-labour failure.
So I fear that #3712 would only perpetuate that awkwardness !

@pp-mo pp-mo changed the title (WIP) Ensure flags load/save without units PI:3358 (WIP) Ensure flags load/save without units Jun 3, 2020
@pp-mo
Copy link
Member

pp-mo commented Jun 3, 2020

@stephenworsley WARNING: I see we are losing Travis testing again.
Hence, the above shows "all tests pass" despite the errors you mention (which presumably are breaking some tests ?)

I wish I understood why this happens !
I know it can occur when there is a 'syntax error' (or equivalent) in .travis.yml, but I doubt that is the case this time ...

@pp-mo
Copy link
Member

pp-mo commented Jun 3, 2020

the errors you mention (which presumably are breaking some tests ?)

Yes, I find that 3 are failing in iris.tests.test_netcdf
Notably:

FAIL: test_cell_measures (__main__.TestNetCDFLoad)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "lib/iris/tests/test_netcdf.py", line 398, in test_cell_measures
    self.assertEqual(cms[0], expected)
AssertionError: CellM[203 chars] var_name='my_areas') != CellM[203 chars] var_name='my_areas', attributes={'custom': 'extra-attribute'})

@pp-mo pp-mo mentioned this pull request Jun 4, 2020
@pp-mo
Copy link
Member

pp-mo commented Jun 5, 2020

Ok, this is looking good !
I think all your tweaks to the Mock tests look neat + sensible (though I know it can take ages to work out where to make these adjustments !)

So, I think that now happily "proves" the correctness of the essential change.
The problems/questions remaining here are :
#1. includes tweaks to requirements, that will need removing when we merge in the relevant fixes on master
#2. includes hacks for saving, that will need removing when we merge with default_units feature branch

I think the correct approach is just to merge this, leaving those changes in, and take care of them in the merge process.
At least, leaving in the requirements changes means that we can do further work with launch_ancils branch if we need to (though I suspect we don't need to)

@pp-mo pp-mo merged commit 25ea1ba into SciTools:launch_ancils Jun 5, 2020
@pp-mo pp-mo mentioned this pull request Jun 5, 2020
abooton pushed a commit to abooton/iris that referenced this pull request Jun 5, 2020
* change dependencies : NOTE these changes need removing when remerging to master
@abooton abooton changed the title PI:3358 (WIP) Ensure flags load/save without units [FB] [PI-3358] (WIP) Ensure flags load/save without units Jun 10, 2020
@abooton abooton changed the title [FB] [PI-3358] (WIP) Ensure flags load/save without units [FB] [PI-3474] (WIP) Ensure flags load/save without units Jun 10, 2020
@abooton abooton linked an issue Jun 10, 2020 that may be closed by this pull request
5 tasks
stephenworsley added a commit to stephenworsley/iris that referenced this pull request Aug 20, 2020
… launch_ancils

* 'launch_ancils' of https://github.com/SciTools/iris:
  Resolve conflicts with requirements/test.txt to update the imagehash pin to imagehash>=4.0 (SciTools#3735)
  PI:3358 (WIP) Ensure flags load/save without units (SciTools#3613)
  PI-3473 Rename "engine.provides" (SciTools#3590)
  PI-3473: Whatsnews relating to ancillary load + save. (SciTools#3557)
  PI-3473: Netcdf loading ancillary variables (SciTools#3556)

� Conflicts:
�	lib/iris/fileformats/netcdf.py
�	lib/iris/tests/unit/fileformats/netcdf/test__load_aux_factory.py
�	requirements/core.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants