Partial collapse multi-dim aux coords #3008

duncanwp · 2018-05-01T15:13:11Z

Allow the partial collapse of multi-dimensional auxiliary coordinates.

stickler-ci · 2018-05-01T15:13:37Z

lib/iris/coords.py

        """
+        # Ensure dims_to_collapse is a tuple to be able to pass through to numpy


E501 line too long (80 > 79 characters)

stickler-ci · 2018-05-01T15:13:37Z

lib/iris/tests/test_analysis.py

+        self.cube_with_aux_coord.coord('grid_longitude').guess_bounds()
+
+        self.weights = area_weights(self.cube_with_aux_coord, normalize=False)
+        self.normalized_weights = area_weights(self.cube_with_aux_coord, normalize=True)


E501 line too long (88 > 79 characters)

stickler-ci · 2018-05-01T15:13:37Z

lib/iris/tests/test_analysis.py

+        #  [124, 125, 126, 127, 128, 129]]
+
+    def test_max(self):
+        cube = self.cube_with_aux_coord.collapsed('grid_latitude', iris.analysis.MAX)


E501 line too long (85 > 79 characters)

stickler-ci · 2018-05-01T15:13:37Z

lib/iris/tests/test_analysis.py

+                                                [105, 129]]))
+
+        # Check collapsing over the whole coord still works
+        cube = self.cube_with_aux_coord.collapsed('altitude', iris.analysis.MAX)


E501 line too long (80 > 79 characters)

stickler-ci · 2018-05-01T15:13:37Z

lib/iris/tests/test_analysis.py

+        np.testing.assert_array_equal(cube.coord('surface_altitude').bounds,
+                                      np.array([[100, 129]]))
+
+        cube = self.cube_with_aux_coord.collapsed('grid_longitude', iris.analysis.MAX)


E501 line too long (86 > 79 characters)

pp-mo · 2018-05-02T13:36:06Z

lib/iris/coords.py


            # Create the new collapsed coordinate.
-            if is_lazy_data(item):


I think this change might be losing us some lazy behaviour here.
As it is using np.concatenate / np.stack in place of Dask ones.
Otherwise what was multidim_lazy_stack doing ?
I suspect we need another test, that doesn't currently exist, to ensure that collapse is 'lazy' in these cases.

Yes - good point, thanks

pelson · 2018-05-02T13:43:06Z

lib/iris/coords.py

+            # Calculate the bounds and points along the right dims
+            bounds = np.stack([item.min(axis=dims_to_collapse),
+                               item.max(axis=dims_to_collapse)]).T
+            points = item.mean(axis=dims_to_collapse, dtype=self.dtype)


This will need a what's new. It is much better behaviour (as we gain a bit more information about the distribution of the data beforehand), but it will have an impact on almost every dataset that was ever collapsed...

pelson · 2018-05-02T13:49:22Z

Having worked through the logic, I'm comfortable that in principle we can do partial dimension collapses, and that collapses are commutative when it comes to chaining multiple collapses for the bounded values. This is not the case for the point values when we use the mean, but I think I'm OK with that.

In short, I'm in favour of this. We obviously need to work through a few of the small details (what's new, appropriate test coverage, lazy operations), but once done, 👍.

duncanwp · 2018-05-02T13:56:04Z

Great! I'll try and get those changes in today or tomorrow.

I think the mean does commute when chaining collapses though doesn't it?

pelson · 2018-05-02T14:12:02Z

I think the mean does commute when chaining collapses though doesn't it?

Quite right. We are talking about rectangular arrays. 😄

stickler-ci · 2018-05-03T20:42:03Z

lib/iris/coords.py

-            item = self.core_bounds() if self.has_bounds() \
+            # Determine the right array method for stacking
+            stack_method = da.stack if self.has_bounds() \
+                                       and is_lazy_data(self.core_bounds()) \


E127 continuation line over-indented for visual indent

duncanwp · 2018-05-04T12:34:37Z

@pelson I'm currently having to fix the cml differences one at a time because the tests stop at the first failure. Is there a way to make them all run together? I try running them locally but they then fail because the shapes are long (rather than short) ints, is that a library version issue?

pelson · 2018-05-04T14:28:04Z

I'm currently having to fix the cml differences one at a time because the tests stop at the first failure. Is there a way to make them all run together?

😢 no. I'm afraid not.

I try running them locally but they then fail because the shapes are long (rather than short) ints, is that a library version issue?

Happy to help diagnose this - not sure what is going on there though. Numpy version issues? We are currently testing with 1.13 on travis, so perhaps take a look there?

pelson

This change has quite wide-reaching implications (as can be seen by the number of test changes), but on the whole I'm in favour of it. In particular, the mid-point that was previously computed was simply the mean of the bounds (which can still be computed). However now we are returning the mean of the inputs, which gives the user more information about their original inputs.

Given the implications, I'd like to get at least one other 👍 on the review side of things before merging.

pelson · 2018-05-15T14:32:53Z

docs/iris/src/whatsnew/contributions_2.1/newfeature_2018-May-03_multidim_collapse.txt

@@ -0,0 +1,5 @@
+* The partial collapse of multi-dimensional auxiliary coordinates is now
+  supported. Collapsed bounds span the range of the collapsed dimension(s).
+  *Note* the collapsed points now represent the mean over the collapsed


Heads up @SciTools/iris-devs. This is a big change, but IMO makes a lot of sense. If uses really care about the mid-point, it can be computed from the bounds...

Regarding the 'mean point / mean bounds' change: I can see the sense, and I can't think of a case in which this is simply "wrong". On the other hand, while some may think it "nicer", I'm not seeing any huge benefit. Plus, it does affect cases where previously you could easily predict the result of this + now it's not so obvious.

Unless I'm missing something, this change is also really independent of the partial-collapse implementation so there is no actual need to couple the two together.

This is a breaking change (in behaviour) + strictly we ought to 'future' it as it can break existing code.
That is of course a huge pain.
On that basis, I'm 👎 on that bit, while obviously 👍 on the rest.

Thanks for the different perspective @pp-mo.

This is a breaking change (in behaviour) + strictly we ought to 'future' it as it can break existing code.

I'm not sold on that personally. As we all know, any change can has the potential to break existing code. To me, major/minor/patch versioning is much more about API than it is behaviour, but I can see your POV. As an example, we don't even need to change a single line of code for it to be able to have a behavioural impact (think numpy & matplotlib upgrades). That said, let's not descend into semver-semantics over it 😄.

Unless I'm missing something, this change is also really independent of the partial-collapse implementation so there is no actual need to couple the two together.

I'm in favour of that approach to. My only concern is that we've sent @duncanwp down a bit of a rabbit-hole fixing up a bunch of tests that he could have got away with not having done. Sorry about that @duncanwp - do you think it will be too painful to split the two implementations?

fixing up a bunch of tests that he could have got away with not having done

Apologies @duncanwp, I hadn't really spotted that aspect ...

I could switch the tests back, but I'd like to keep the option of using the mean. Could we add a keyword to Coord.collapse? Or create a futures/config option?

To be clear, the proposal is to create two PRs, one implementing the MEAN, one implementing the multi-dim collapse. I don't think there is any discussion remaining regarding the multi-dim collapse, but there is still a conversation to be had about MEAN being the correct approach.

Is that how you've understood it too @duncanwp?

stickler-ci · 2018-05-24T20:41:40Z

Could not review pull request. It may be too large, or contain no reviewable changes.

duncanwp · 2018-05-24T21:26:04Z

Apologies - the only way I could seem to dig my way out of the git mess I made was two new pull requests!

stickler-ci reviewed May 1, 2018

View reviewed changes

pp-mo reviewed May 2, 2018

View reviewed changes

pelson reviewed May 2, 2018

View reviewed changes

stickler-ci reviewed May 3, 2018

View reviewed changes

pelson approved these changes May 15, 2018

View reviewed changes

duncanwp closed this May 24, 2018

duncanwp force-pushed the master branch from 7bb93a7 to 40be11b Compare May 24, 2018 20:41

This was referenced May 24, 2018

Partial collapse of multi-dimensional coordinates #3028

Merged

Collapse mean points #3029

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial collapse multi-dim aux coords #3008

Partial collapse multi-dim aux coords #3008

duncanwp commented May 1, 2018

stickler-ci May 1, 2018

stickler-ci May 1, 2018

stickler-ci May 1, 2018

stickler-ci May 1, 2018

stickler-ci May 1, 2018

pp-mo May 2, 2018 •

edited

Loading

duncanwp May 2, 2018

pelson May 2, 2018

pelson commented May 2, 2018

duncanwp commented May 2, 2018

pelson commented May 2, 2018

stickler-ci May 3, 2018

duncanwp commented May 4, 2018

pelson commented May 4, 2018

pelson left a comment

pelson May 15, 2018

pp-mo May 15, 2018 •

edited

Loading

pelson May 15, 2018

pp-mo May 16, 2018

duncanwp May 21, 2018

pelson May 22, 2018

stickler-ci commented May 24, 2018

duncanwp commented May 24, 2018

		"""
		# Ensure dims_to_collapse is a tuple to be able to pass through to numpy


		# Create the new collapsed coordinate.
		if is_lazy_data(item):

Partial collapse multi-dim aux coords #3008

Partial collapse multi-dim aux coords #3008

Conversation

duncanwp commented May 1, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pp-mo May 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pelson commented May 2, 2018

duncanwp commented May 2, 2018

pelson commented May 2, 2018

Choose a reason for hiding this comment

duncanwp commented May 4, 2018

pelson commented May 4, 2018

pelson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pp-mo May 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stickler-ci commented May 24, 2018

duncanwp commented May 24, 2018

pp-mo May 2, 2018 •

edited

Loading

pp-mo May 15, 2018 •

edited

Loading