-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial collapse multi-dim aux coords #3008
Conversation
lib/iris/coords.py
Outdated
""" | ||
# Ensure dims_to_collapse is a tuple to be able to pass through to numpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E501 line too long (80 > 79 characters)
lib/iris/tests/test_analysis.py
Outdated
self.cube_with_aux_coord.coord('grid_longitude').guess_bounds() | ||
|
||
self.weights = area_weights(self.cube_with_aux_coord, normalize=False) | ||
self.normalized_weights = area_weights(self.cube_with_aux_coord, normalize=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E501 line too long (88 > 79 characters)
lib/iris/tests/test_analysis.py
Outdated
# [124, 125, 126, 127, 128, 129]] | ||
|
||
def test_max(self): | ||
cube = self.cube_with_aux_coord.collapsed('grid_latitude', iris.analysis.MAX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E501 line too long (85 > 79 characters)
lib/iris/tests/test_analysis.py
Outdated
[105, 129]])) | ||
|
||
# Check collapsing over the whole coord still works | ||
cube = self.cube_with_aux_coord.collapsed('altitude', iris.analysis.MAX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E501 line too long (80 > 79 characters)
lib/iris/tests/test_analysis.py
Outdated
np.testing.assert_array_equal(cube.coord('surface_altitude').bounds, | ||
np.array([[100, 129]])) | ||
|
||
cube = self.cube_with_aux_coord.collapsed('grid_longitude', iris.analysis.MAX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E501 line too long (86 > 79 characters)
lib/iris/coords.py
Outdated
|
||
# Create the new collapsed coordinate. | ||
if is_lazy_data(item): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change might be losing us some lazy behaviour here.
As it is using np.concatenate / np.stack in place of Dask ones.
Otherwise what was multidim_lazy_stack
doing ?
I suspect we need another test, that doesn't currently exist, to ensure that collapse is 'lazy' in these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - good point, thanks
lib/iris/coords.py
Outdated
# Calculate the bounds and points along the right dims | ||
bounds = np.stack([item.min(axis=dims_to_collapse), | ||
item.max(axis=dims_to_collapse)]).T | ||
points = item.mean(axis=dims_to_collapse, dtype=self.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need a what's new. It is much better behaviour (as we gain a bit more information about the distribution of the data beforehand), but it will have an impact on almost every dataset that was ever collapsed
...
Having worked through the logic, I'm comfortable that in principle we can do partial dimension collapses, and that collapses are commutative when it comes to chaining multiple collapses for the bounded values. This is not the case for the point values when we use the mean, but I think I'm OK with that. In short, I'm in favour of this. We obviously need to work through a few of the small details (what's new, appropriate test coverage, lazy operations), but once done, 👍. |
Great! I'll try and get those changes in today or tomorrow. I think the mean does commute when chaining collapses though doesn't it? |
Quite right. We are talking about rectangular arrays. 😄 |
lib/iris/coords.py
Outdated
item = self.core_bounds() if self.has_bounds() \ | ||
# Determine the right array method for stacking | ||
stack_method = da.stack if self.has_bounds() \ | ||
and is_lazy_data(self.core_bounds()) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E127 continuation line over-indented for visual indent
@pelson I'm currently having to fix the cml differences one at a time because the tests stop at the first failure. Is there a way to make them all run together? I try running them locally but they then fail because the shapes are long (rather than short) ints, is that a library version issue? |
😢 no. I'm afraid not.
Happy to help diagnose this - not sure what is going on there though. Numpy version issues? We are currently testing with 1.13 on travis, so perhaps take a look there? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change has quite wide-reaching implications (as can be seen by the number of test changes), but on the whole I'm in favour of it. In particular, the mid-point that was previously computed was simply the mean of the bounds (which can still be computed). However now we are returning the mean of the inputs, which gives the user more information about their original inputs.
Given the implications, I'd like to get at least one other 👍 on the review side of things before merging.
@@ -0,0 +1,5 @@ | |||
* The partial collapse of multi-dimensional auxiliary coordinates is now | |||
supported. Collapsed bounds span the range of the collapsed dimension(s). | |||
*Note* the collapsed points now represent the mean over the collapsed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heads up @SciTools/iris-devs. This is a big change, but IMO makes a lot of sense. If uses really care about the mid-point, it can be computed from the bounds...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the 'mean point / mean bounds' change: I can see the sense, and I can't think of a case in which this is simply "wrong". On the other hand, while some may think it "nicer", I'm not seeing any huge benefit. Plus, it does affect cases where previously you could easily predict the result of this + now it's not so obvious.
Unless I'm missing something, this change is also really independent of the partial-collapse implementation so there is no actual need to couple the two together.
This is a breaking change (in behaviour) + strictly we ought to 'future' it as it can break existing code.
That is of course a huge pain.
On that basis, I'm 👎 on that bit, while obviously 👍 on the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the different perspective @pp-mo.
This is a breaking change (in behaviour) + strictly we ought to 'future' it as it can break existing code.
I'm not sold on that personally. As we all know, any change can has the potential to break existing code. To me, major/minor/patch versioning is much more about API than it is behaviour, but I can see your POV. As an example, we don't even need to change a single line of code for it to be able to have a behavioural impact (think numpy & matplotlib upgrades). That said, let's not descend into semver-semantics over it 😄.
Unless I'm missing something, this change is also really independent of the partial-collapse implementation so there is no actual need to couple the two together.
I'm in favour of that approach to. My only concern is that we've sent @duncanwp down a bit of a rabbit-hole fixing up a bunch of tests that he could have got away with not having done. Sorry about that @duncanwp - do you think it will be too painful to split the two implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixing up a bunch of tests that he could have got away with not having done
Apologies @duncanwp, I hadn't really spotted that aspect ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could switch the tests back, but I'd like to keep the option of using the mean. Could we add a keyword to Coord.collapse? Or create a futures/config option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, the proposal is to create two PRs, one implementing the MEAN, one implementing the multi-dim collapse. I don't think there is any discussion remaining regarding the multi-dim collapse, but there is still a conversation to be had about MEAN being the correct approach.
Is that how you've understood it too @duncanwp?
Could not review pull request. It may be too large, or contain no reviewable changes. |
Apologies - the only way I could seem to dig my way out of the git mess I made was two new pull requests! |
cc @pelson
Allow the partial collapse of multi-dimensional auxiliary coordinates.