Iris's load and load_cubes utilities return a CubeList with its elements in random order #3314

evertrol · 2019-05-15T16:35:09Z

When loading a dataset that has multiple cubes, load() and load_cubes() will return a CubeList with its elements in random order. This makes it harder for following code to be reproducible: indexing into the list isn't guaranteed to always provide the same cube.

Perhaps there is a good reason CubeList is inherently unordered, but since list (from which it derives) is inherently ordered, I feel it's better to have it be ordered, so that there are no surprises.

The cause seems to in the iris.fileformats.netcdf.load_cubes function, where it calls the .values() method of the various CFGroup objects.
One solution could be (untested) to make CFGroup._cf_variables a collections.OrderedDict (and perhaps the other attributes of CFGroup as well).

Potentially, this may also apply for various other dicts, or even the MutableMapping base class of CFGroup.

The text was updated successfully, but these errors were encountered:

DPeterK · 2019-05-16T15:59:38Z

I'd assert this is a feature not a bug – CubeLists are, and should be expected to be, unordered (think pre-Python 3.6 dictionary keys). If you wish to reproducibly select a cube from the CubeList you should use a Constraint that's defined by metadata of interest and extract from the CubeList using that Constraint. This is a generally more robust approach than indexing.

evertrol · 2019-05-17T07:56:06Z

Ah, right. I got thrown off by its name and parent class (list), and of course methods like append and insert are inherently ordered. I guess it's more a multi-set type of class.

Thanks, I'll use the constraint route.

rcomer · 2019-05-29T10:53:18Z

Hmmm, I think once loaded cubelists are ordered aren't they? If I do:

new_cubelist = iris.cube.Cube([cube.collapsed('foo', iris.analysis.MEAN) for cube in old_cubelist])

then I can expect the order in my new cubelist to be consistent with the old.

However, I think I'd always be nervous about assuming a particular order in a freshly loaded cube, regardless of how iris.load is written. What happens if another process changed your NetCDF file on disk since you last used it?

bjlittle · 2019-06-02T12:13:15Z

Thanks @evertrol for this issue. Much appreciated! 👍

We've never intended the Cubes within a CubeList to in any specific order. This was a design choice from outset of iris, and we'd be hesitant to enforce an ordering - simply because there is no obvious way to do it in a way that everyone would agree with.

So the deal is that you can't rely on any particular ordering - if you consider it random, then that's a pretty safe assumption to make. We've left the ordering down to the user to decide, if that matters to them, post loading. As @DPeterK mentioned, use a constraint to refine and extract the cubes that you care about, rather than indexing - indexing is a very weak contract, there is no guarantee that the same cube will be at the same index when comparing independent loads of the same file/s.

@evertrol If you don't mind, I'm going to close this issue. However, if you feel that you want to re-open it as a means to force an open discussion on Cube ordering within a CubeList, then please do so, and the community will take it from there 😄

evertrol · 2019-06-10T07:14:30Z

Not a problem having this issue closed. It's mostly arguing semantics, and I wouldn't expect to see an API change/rename. As DPeterK remarks, it's mostly a dict keys style (or dict values, if items can be repeated).
Is it worth to put a remark in the documentation about it, either (or both) at the main reference documentation page or the load_cubes() reference documentation?

bjlittle · 2019-06-11T16:32:44Z

@evertrol Thanks.

That seems like a happy compromise to me, so I'm going to re-open this issue with the task of updating the documentation and/or doc-strings to highlight the orders of cubes returned within a CubeList so that it isn't a surprise to users.

lbdreyer · 2019-08-22T12:29:58Z

This has been addressed in @stephenworsley 's PR: #3370 so I'm going to close this issue

edmundhenley-mo · 2021-04-12T15:07:11Z

Hiya all - see this issue is closed (and the reasoning given above). See also that the docs are nicely explicit about lack of inherent order.

However... is there any option/feeling re option of adding a keyword to iris.load such that a user can get a sorted cubelist out if iris.load ends up returning a cube list?

Straw API
cube_list = iris.load(my_file.nc, sort_order=<something>)
where <something> could be (say) one of following:

a set of common keywords (e.g. ["alphabetical", ...])
an option for (say) a user-defined function to determine this (with docs giving some sample functions implementing say alphabetical based on cube.name, reverse alphabetical, field size in bytes, ...)
other?

Whichever of these is used, there could be a default sort_order=None (or similar) default to preserve the current unsorted behaviour as default.

Use-case for this:
I'm trying to use a notebook to document development of scripts to wrangle some ~nasty data which results in a cube list.

Notebooks of course not natively the best for diffing generally (thanks to several of you I know there are lots of tools to help with this!).

But in this case the cubelist being unordered adds needless extra diff line noise I could do without, and quite a lot of it as I'm using html-formatted output.

Not a major issue - but enough of a niggle that I thought worth poking this issue.

Cons
Realise there may be a -1 (more?) aspect in that anything like this may encourage people to index in to the sorted cubelist. Leading to opaque code etc. I hereby solemnly swear never to!.

Moot, but FWIW I do agree as with original poster that this is a bit of a gotcha given list in CubeList name, especially as you can even ~rely on dicts' sort order now!

Overall
IMO would be nice if there were a ~easy option to allow sorting for use-cases like this. Perhaps by making the default unsorted it wouldn't become something people would rely on too much (indexing etc), but could be opted into by those who need it.

jonseddon · 2021-04-12T15:20:05Z

Could you sort your CubeList yourself after loading? For example:

cubes.sort(key=lambda cube: cube.var_name)

edmundhenley-mo · 2021-04-12T15:23:54Z

Definitely! Perfect, thanks Jon! For my use-case this makes my request ignorable, as I can sort after loading but before printing.

bjlittle self-assigned this Jun 2, 2019

bjlittle closed this as completed Jun 2, 2019

bjlittle added this to the v2.3.0 milestone Jun 11, 2019

bjlittle reopened this Jun 11, 2019

bjlittle added Experience: Low Release: Minor Type: Documentation labels Jun 11, 2019

pp-mo mentioned this issue Jun 25, 2019

More control over lazy data creation (chunking) #3333

Closed

lbdreyer assigned stephenworsley Aug 21, 2019

stephenworsley mentioned this issue Aug 21, 2019

Mentioned random CubeList order in the docs #3370

Merged

lbdreyer closed this as completed Aug 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iris's load and load_cubes utilities return a CubeList with its elements in random order #3314

Iris's load and load_cubes utilities return a CubeList with its elements in random order #3314

evertrol commented May 15, 2019

DPeterK commented May 16, 2019

evertrol commented May 17, 2019

rcomer commented May 29, 2019

bjlittle commented Jun 2, 2019

evertrol commented Jun 10, 2019

bjlittle commented Jun 11, 2019

lbdreyer commented Aug 22, 2019

edmundhenley-mo commented Apr 12, 2021 •

edited

Loading

jonseddon commented Apr 12, 2021

edmundhenley-mo commented Apr 12, 2021

Iris's load and load_cubes utilities return a CubeList with its elements in random order #3314

Iris's load and load_cubes utilities return a CubeList with its elements in random order #3314

Comments

evertrol commented May 15, 2019

DPeterK commented May 16, 2019

evertrol commented May 17, 2019

rcomer commented May 29, 2019

bjlittle commented Jun 2, 2019

evertrol commented Jun 10, 2019

bjlittle commented Jun 11, 2019

lbdreyer commented Aug 22, 2019

edmundhenley-mo commented Apr 12, 2021 • edited Loading

jonseddon commented Apr 12, 2021

edmundhenley-mo commented Apr 12, 2021

edmundhenley-mo commented Apr 12, 2021 •

edited

Loading