Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial collapse multi-dim aux coords #3008

Closed
wants to merge 0 commits into from

Conversation

duncanwp
Copy link
Contributor

@duncanwp duncanwp commented May 1, 2018

cc @pelson

Allow the partial collapse of multi-dimensional auxiliary coordinates.

"""
# Ensure dims_to_collapse is a tuple to be able to pass through to numpy

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (80 > 79 characters)

self.cube_with_aux_coord.coord('grid_longitude').guess_bounds()

self.weights = area_weights(self.cube_with_aux_coord, normalize=False)
self.normalized_weights = area_weights(self.cube_with_aux_coord, normalize=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (88 > 79 characters)

# [124, 125, 126, 127, 128, 129]]

def test_max(self):
cube = self.cube_with_aux_coord.collapsed('grid_latitude', iris.analysis.MAX)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (85 > 79 characters)

[105, 129]]))

# Check collapsing over the whole coord still works
cube = self.cube_with_aux_coord.collapsed('altitude', iris.analysis.MAX)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (80 > 79 characters)

np.testing.assert_array_equal(cube.coord('surface_altitude').bounds,
np.array([[100, 129]]))

cube = self.cube_with_aux_coord.collapsed('grid_longitude', iris.analysis.MAX)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (86 > 79 characters)


# Create the new collapsed coordinate.
if is_lazy_data(item):
Copy link
Member

@pp-mo pp-mo May 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change might be losing us some lazy behaviour here.
As it is using np.concatenate / np.stack in place of Dask ones.
Otherwise what was multidim_lazy_stack doing ?
I suspect we need another test, that doesn't currently exist, to ensure that collapse is 'lazy' in these cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - good point, thanks

# Calculate the bounds and points along the right dims
bounds = np.stack([item.min(axis=dims_to_collapse),
item.max(axis=dims_to_collapse)]).T
points = item.mean(axis=dims_to_collapse, dtype=self.dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need a what's new. It is much better behaviour (as we gain a bit more information about the distribution of the data beforehand), but it will have an impact on almost every dataset that was ever collapsed...

@pelson
Copy link
Member

pelson commented May 2, 2018

Having worked through the logic, I'm comfortable that in principle we can do partial dimension collapses, and that collapses are commutative when it comes to chaining multiple collapses for the bounded values. This is not the case for the point values when we use the mean, but I think I'm OK with that.

In short, I'm in favour of this. We obviously need to work through a few of the small details (what's new, appropriate test coverage, lazy operations), but once done, 👍.

@duncanwp
Copy link
Contributor Author

duncanwp commented May 2, 2018

Great! I'll try and get those changes in today or tomorrow.

I think the mean does commute when chaining collapses though doesn't it?

@pelson
Copy link
Member

pelson commented May 2, 2018

I think the mean does commute when chaining collapses though doesn't it?

Quite right. We are talking about rectangular arrays. 😄

item = self.core_bounds() if self.has_bounds() \
# Determine the right array method for stacking
stack_method = da.stack if self.has_bounds() \
and is_lazy_data(self.core_bounds()) \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E127 continuation line over-indented for visual indent

@duncanwp
Copy link
Contributor Author

duncanwp commented May 4, 2018

@pelson I'm currently having to fix the cml differences one at a time because the tests stop at the first failure. Is there a way to make them all run together? I try running them locally but they then fail because the shapes are long (rather than short) ints, is that a library version issue?

@pelson
Copy link
Member

pelson commented May 4, 2018

I'm currently having to fix the cml differences one at a time because the tests stop at the first failure. Is there a way to make them all run together?

😢 no. I'm afraid not.

I try running them locally but they then fail because the shapes are long (rather than short) ints, is that a library version issue?

Happy to help diagnose this - not sure what is going on there though. Numpy version issues? We are currently testing with 1.13 on travis, so perhaps take a look there?

Copy link
Member

@pelson pelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has quite wide-reaching implications (as can be seen by the number of test changes), but on the whole I'm in favour of it. In particular, the mid-point that was previously computed was simply the mean of the bounds (which can still be computed). However now we are returning the mean of the inputs, which gives the user more information about their original inputs.

Given the implications, I'd like to get at least one other 👍 on the review side of things before merging.

@@ -0,0 +1,5 @@
* The partial collapse of multi-dimensional auxiliary coordinates is now
supported. Collapsed bounds span the range of the collapsed dimension(s).
*Note* the collapsed points now represent the mean over the collapsed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up @SciTools/iris-devs. This is a big change, but IMO makes a lot of sense. If uses really care about the mid-point, it can be computed from the bounds...

Copy link
Member

@pp-mo pp-mo May 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the 'mean point / mean bounds' change: I can see the sense, and I can't think of a case in which this is simply "wrong". On the other hand, while some may think it "nicer", I'm not seeing any huge benefit. Plus, it does affect cases where previously you could easily predict the result of this + now it's not so obvious.

Unless I'm missing something, this change is also really independent of the partial-collapse implementation so there is no actual need to couple the two together.

This is a breaking change (in behaviour) + strictly we ought to 'future' it as it can break existing code.
That is of course a huge pain.
On that basis, I'm 👎 on that bit, while obviously 👍 on the rest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the different perspective @pp-mo.

This is a breaking change (in behaviour) + strictly we ought to 'future' it as it can break existing code.

I'm not sold on that personally. As we all know, any change can has the potential to break existing code. To me, major/minor/patch versioning is much more about API than it is behaviour, but I can see your POV. As an example, we don't even need to change a single line of code for it to be able to have a behavioural impact (think numpy & matplotlib upgrades). That said, let's not descend into semver-semantics over it 😄.

Unless I'm missing something, this change is also really independent of the partial-collapse implementation so there is no actual need to couple the two together.

I'm in favour of that approach to. My only concern is that we've sent @duncanwp down a bit of a rabbit-hole fixing up a bunch of tests that he could have got away with not having done. Sorry about that @duncanwp - do you think it will be too painful to split the two implementations?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing up a bunch of tests that he could have got away with not having done

Apologies @duncanwp, I hadn't really spotted that aspect ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could switch the tests back, but I'd like to keep the option of using the mean. Could we add a keyword to Coord.collapse? Or create a futures/config option?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, the proposal is to create two PRs, one implementing the MEAN, one implementing the multi-dim collapse. I don't think there is any discussion remaining regarding the multi-dim collapse, but there is still a conversation to be had about MEAN being the correct approach.

Is that how you've understood it too @duncanwp?

@stickler-ci
Copy link

Could not review pull request. It may be too large, or contain no reviewable changes.

@duncanwp
Copy link
Contributor Author

Apologies - the only way I could seem to dig my way out of the git mess I made was two new pull requests!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants