[FB] [PI-3474] (WIP) Ensure flags load/save without units #3613

stephenworsley · 2019-12-19T15:52:26Z

Addresses #3474. Anything loaded from netCDF with a flag attribute (some attribute wich is either flag_values, flag_masks or flag_meanings) is explicitly given a unit of "no_unit" in iris. Anything in iris with a unit of "no_unit" is now saved to netCDF without units. This also addresses #3394.
Also aligns with the improvements to the use of units discussed in #3358

pp-mo · 2020-06-02T17:47:05Z

lib/iris/fileformats/_pyke_rules/fc_rules_cf.krb

@@ -1689,6 +1689,10 @@ fc_extras
        if np.issubdtype(cf_var.dtype, np.str_):
            attr_units = cf_units._NO_UNIT_STRING

+        if "flag_values" in cf_var._nc_attrs or "flag_masks" in cf_var._nc_attrs \


I don't think we should be using a private property of CFVariable here if we can avoid it.
I've just spent much too long trying to understand exactly what CFVariable does + how,
and I would say, please let's not reply on that implementation here !

I think I would use any(hasattr(cf_var, name) for name in (...)).

It's true, this will be less efficient because hasattr depends entirely on getattr
On the other hand, it will only fetch them once per variable, as the CFVariable implementation does ensure that.
Given that netcdf files don't have the proliferation of small things that field-based formats do, I think it will be ok for performance -- although we could test that ?

So it looks like there is some unusual behaviour associated with getting atrributes from cf_var where accessing an attribute will cause it to not be picked up as an attribute in the final object. I think this is due to this piece of code

iris/lib/iris/fileformats/cf.py

Lines 175 to 183 in 715d019

def __getattr__(self, name):

# Accessing netCDF attributes is surprisingly slow. Since

# they're often read repeatedly, caching the values makes things

# quite a bit faster.

if name in self._nc_attrs:

self._cf_attrs.add(name)

value = getattr(self.cf_data, name)

setattr(self, name, value)

return value

It seems like in the current implementation, "attributes" contains everything which has not already been accessed in this way. Trying to access these attributes prematurely causes them to be ignored later on.
I believe this is why I chose to take attributes from cf_var._nc_attrs rather than directly from cf_var. I tested your suggestion and it looks like this would cause attributes to load incorrectly.

This does mean that we are bypassing a mechanism which is specifically designed to improve performance. I have a feeling that the ideal solution may have to involve separating the mechanism for accessing cf_var attributes from the mechanism for chosing which cf_var attributes to ignore when setting iris attributes.

@stephenworsley I have a feeling that the ideal solution may have to involve separating the mechanism for accessing cf_var attributes from the mechanism for chosing which cf_var attributes to ignore when setting iris attributes.

You're dead right !
I do think that the overlapping schemes in CFVariable generate a whole heap of extra confusion.
( I spent quite some time trying to work out why the 'ignored' and 'unused' ideas work in just the the way that they do. )

Having said this, checking the attributes of cf_var.cf_data seems to work while relying on more public API.

So, I'm really sorry for making a duff suggestion !
A simple "clean" solution would be to test hasattr(cf_var.cf_data) instead of hasattr(cf_var). But, like you, I'm still a bit concerned about bypassing the speedup mechanism (even though I suspect that at least some of it is probably unnecessary).

Another really simply fix for my original objection would be to retain your existing code solution, but just make cf_var._nc_attrs public -- by renaming, removing the underscore.
How would you feel about that ?

( Note: there aren't many existing uses of this, and they are all within cf.py )

I feel like either of these could work. To be honest, I don't have a strong enough picture of how cf_var is structured to have a clear idea of which feels more correct. In either case, I suspect both approaches would require changing the a lot of the tests currently using a mocked cf_var.

pp-mo · 2020-06-02T18:01:49Z

After long, long thought comparing this to #3712, I think I do prefer this approach.

The real cause of the problems here lies in the confused division-of-labour between netcdf.py, cf.py and _pyke_rules/fc_rules_cf.krb.
All of those carry some of the encoding of CF concepts, unfortunately (though netcdf.py much less so).

I don't think we can seek to fix that here + now, but must live with it.
So, I think this approach is the more natural one, within the existing code context.

I also think that the whole business of adjusting attributes during the cube-load in netcdf.py is probably the nastiest bit of netcdf.py, and the worst of the division-of-labour failure.
So I fear that #3712 would only perpetuate that awkwardness !

pp-mo · 2020-06-03T14:18:46Z

@stephenworsley WARNING: I see we are losing Travis testing again.
Hence, the above shows "all tests pass" despite the errors you mention (which presumably are breaking some tests ?)

I wish I understood why this happens !
I know it can occur when there is a 'syntax error' (or equivalent) in .travis.yml, but I doubt that is the case this time ...

pp-mo · 2020-06-03T14:23:37Z

the errors you mention (which presumably are breaking some tests ?)

Yes, I find that 3 are failing in iris.tests.test_netcdf
Notably:

FAIL: test_cell_measures (__main__.TestNetCDFLoad)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "lib/iris/tests/test_netcdf.py", line 398, in test_cell_measures
    self.assertEqual(cms[0], expected)
AssertionError: CellM[203 chars] var_name='my_areas') != CellM[203 chars] var_name='my_areas', attributes={'custom': 'extra-attribute'})

pp-mo · 2020-06-05T11:34:09Z

Ok, this is looking good !
I think all your tweaks to the Mock tests look neat + sensible (though I know it can take ages to work out where to make these adjustments !)

So, I think that now happily "proves" the correctness of the essential change.
The problems/questions remaining here are :
#1. includes tweaks to requirements, that will need removing when we merge in the relevant fixes on master
#2. includes hacks for saving, that will need removing when we merge with default_units feature branch

I think the correct approach is just to merge this, leaving those changes in, and take care of them in the merge process.
At least, leaving in the requirements changes means that we can do further work with launch_ancils branch if we need to (though I suspect we don't need to)

* change dependencies : NOTE these changes need removing when remerging to master

… launch_ancils * 'launch_ancils' of https://github.com/SciTools/iris: Resolve conflicts with requirements/test.txt to update the imagehash pin to imagehash>=4.0 (SciTools#3735) PI:3358 (WIP) Ensure flags load/save without units (SciTools#3613) PI-3473 Rename "engine.provides" (SciTools#3590) PI-3473: Whatsnews relating to ancillary load + save. (SciTools#3557) PI-3473: Netcdf loading ancillary variables (SciTools#3556) � Conflicts: � lib/iris/fileformats/netcdf.py � lib/iris/tests/unit/fileformats/netcdf/test__load_aux_factory.py � requirements/core.txt

stephenworsley added 3 commits December 19, 2019 15:44

ensure flags load/save without units

f9e1b1e

add load test

6c74050

minor fixes

76d28bd

stephenworsley added this to the v3.0.0 milestone Dec 20, 2019

stephenworsley changed the title ~~Ensure flags load/save without units~~ (WIP) Ensure flags load/save without units May 19, 2020

stephenworsley mentioned this pull request May 19, 2020

(WIP) Alternative method for assigning flags "no_unit" #3712

Closed

pp-mo mentioned this pull request May 29, 2020

[FB] [PI-3585] Change default units to "unknown" for all DimensionalMetadata #3713

Merged

pp-mo reviewed Jun 2, 2020

View reviewed changes

pp-mo changed the title ~~(WIP) Ensure flags load/save without units~~ PI:3358 (WIP) Ensure flags load/save without units Jun 3, 2020

change method for checking cf_var attributes

6274f2a

stephenworsley added 2 commits June 3, 2020 17:10

Fix tests

1267b8f

change cf_var attribute retrieval from in to hasattr

f915703

pp-mo mentioned this pull request Jun 4, 2020

Hack some versions #3729

Closed

change dependencies

5665d11

pp-mo merged commit 25ea1ba into SciTools:launch_ancils Jun 5, 2020

pp-mo mentioned this pull request Jun 5, 2020

Myhacked launch ancils #3730

Closed

abooton pushed a commit to abooton/iris that referenced this pull request Jun 5, 2020

PI:3358 (WIP) Ensure flags load/save without units (SciTools#3613)

843dcbe

* change dependencies : NOTE these changes need removing when remerging to master

abooton changed the title ~~PI:3358 (WIP) Ensure flags load/save without units~~ [FB] [PI-3358] (WIP) Ensure flags load/save without units Jun 10, 2020

abooton changed the title ~~[FB] [PI-3358] (WIP) Ensure flags load/save without units~~ [FB] [PI-3474] (WIP) Ensure flags load/save without units Jun 10, 2020

abooton linked an issue Jun 10, 2020 that may be closed by this pull request

[PI] Ensure flags are correctly loaded/saved to/from NetCDF #3474

Closed

5 tasks

abooton removed a link to an issue Jun 10, 2020

[PI] Ensure flags are correctly loaded/saved to/from NetCDF #3474

Closed

5 tasks

lbdreyer mentioned this pull request Aug 21, 2020

[PI] Ensure flags are correctly loaded/saved to/from NetCDF #3474

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FB] [PI-3474] (WIP) Ensure flags load/save without units #3613

[FB] [PI-3474] (WIP) Ensure flags load/save without units #3613

stephenworsley commented Dec 19, 2019 •

edited by abooton

Loading

pp-mo Jun 2, 2020

stephenworsley Jun 3, 2020

pp-mo Jun 3, 2020

stephenworsley Jun 3, 2020

pp-mo Jun 3, 2020 •

edited

Loading

stephenworsley Jun 3, 2020

pp-mo commented Jun 2, 2020

pp-mo commented Jun 3, 2020 •

edited

Loading

pp-mo commented Jun 3, 2020

pp-mo commented Jun 5, 2020

	def __getattr__(self, name):
	# Accessing netCDF attributes is surprisingly slow. Since
	# they're often read repeatedly, caching the values makes things
	# quite a bit faster.
	if name in self._nc_attrs:
	self._cf_attrs.add(name)
	value = getattr(self.cf_data, name)
	setattr(self, name, value)
	return value

[FB] [PI-3474] (WIP) Ensure flags load/save without units #3613

[FB] [PI-3474] (WIP) Ensure flags load/save without units #3613

Conversation

stephenworsley commented Dec 19, 2019 • edited by abooton Loading

pp-mo Jun 2, 2020

Choose a reason for hiding this comment

stephenworsley Jun 3, 2020

Choose a reason for hiding this comment

pp-mo Jun 3, 2020

Choose a reason for hiding this comment

stephenworsley Jun 3, 2020

Choose a reason for hiding this comment

pp-mo Jun 3, 2020 • edited Loading

Choose a reason for hiding this comment

stephenworsley Jun 3, 2020

Choose a reason for hiding this comment

pp-mo commented Jun 2, 2020

pp-mo commented Jun 3, 2020 • edited Loading

pp-mo commented Jun 3, 2020

pp-mo commented Jun 5, 2020

stephenworsley commented Dec 19, 2019 •

edited by abooton

Loading

pp-mo Jun 3, 2020 •

edited

Loading

pp-mo commented Jun 3, 2020 •

edited

Loading