Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cube.aggregated_by output bounds #4315

Merged
merged 7 commits into from
Dec 8, 2021
Merged

Conversation

rcomer
Copy link
Member

@rcomer rcomer commented Sep 15, 2021

🚀 Pull Request

Description

I've had this one in my sights for some time, as I have workarounds for it in my user code.

A step towards closing #1528. If a coordinate's cells are not monotonic, we now calculate the new bounds as the minimum and maximum from the relevant slice, rather than first and last cell from the slice. The example given in #1528 now gives us the bounds we expect

means.coord('realization'):
 AuxCoord(array([2.5, 2.5, 2.5]), bounds=array([[1, 4],
       [1, 4],
       [1, 4]]), standard_name='realization', units=Unit('unknown'))

Note that @pp-mo also said in #1528 that we expect the points to all be 2. That would be consistent with what we get if we make the same calculation via extract and collapsed:

for year, month in zip([1996, 1997, 1997], ['Dec', 'Jan', 'Feb']):
    constraint = iris.Constraint(year=year, month=month)
    mean = cube.extract(constraint).collapsed('realization', iris.analysis.MEAN)
    print(mean.coord('realization'))
AuxCoord(array([2]), bounds=array([[1, 4]]), standard_name='realization', units=Unit('unknown'))
AuxCoord(array([2]), bounds=array([[1, 4]]), standard_name='realization', units=Unit('unknown'))
AuxCoord(array([2]), bounds=array([[1, 4]]), standard_name='realization', units=Unit('unknown'))

I think it we should make collapsed and aggregated_by consistent on this, but I'm unclear which of them currently has it right. I also suspect that making changes to the output points of either method should wait for a major release: I did try updating the points dtype to match the bounds dtype within aggregated_by, and got 9 test failures.

My current proposed change to bounds handling only affects coordinates that started out in no particular order. It's hard to see how the existing behaviour is useful in those cases, so seems safe enough to call this a bugfix.

Note that the first two commits here are just a refactor of the existing code to reduce repetition, and do not affect functionality.


Consult Iris pull request check list

@rcomer
Copy link
Member Author

rcomer commented Oct 18, 2021

I checked the McCabe complexity of this method. It seems my initial refactor reduced it by 2, but the min/max handling increased it by 2. So the score is unchanged at 18.

Copy link
Member

@lbdreyer lbdreyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thanks @rcomer !

I completely agree with your assessment; the handling of the bounds is a bug fix, whereas deciding how aggregated_by and extract/collapse should handle the points is more likely to be considered a breaking change and should wait for a major release.

lib/iris/tests/unit/cube/test_Cube.py Show resolved Hide resolved
@lbdreyer lbdreyer merged commit a5ac24a into SciTools:main Dec 8, 2021
@rcomer rcomer deleted the aggregated_by-bounds branch December 8, 2021 15:51
@rcomer
Copy link
Member Author

rcomer commented Dec 8, 2021

Thanks @lbdreyer! I have opened #4455 to capture the remaining points issue and so closed #1528.

tkknight added a commit to tkknight/iris that referenced this pull request Dec 14, 2021
* upstream/main: (78 commits)
  Updated environment lockfiles (SciTools#4458)
  remove asv package dependency (SciTools#4456)
  cube.aggregated_by output bounds (SciTools#4315)
  [pre-commit.ci] pre-commit autoupdate (SciTools#4452)
  Whatsnew (SciTools#4451)
  Explicitly define tests so nose can find them (SciTools#4450)
  Updated environment lockfiles (SciTools#4449)
  Update CI environment lockfiles (SciTools#4424)
  Disable GHA workflows on non-SciTools branches (SciTools#4444)
  Add new contributor to whatsnew (SciTools#4443)
  Use dask-core instead of dask in ci (SciTools#4434)
  Mesh recombine (SciTools#4437)
  Mesh full comparison (SciTools#4439)
  Only try to work out the differences between points for multiple (SciTools#4367)
  [pre-commit.ci] pre-commit autoupdate (SciTools#4430)
  Fix license PyPI classifier (SciTools#4435)
  Whatsnew for PR SciTools#4367 (SciTools#4440)
  Suggest type hinting (SciTools#4390)
  area weight regrid test fixes (SciTools#4432)
  Update latest.rst (SciTools#4425)
  ...
tkknight added a commit to tkknight/iris that referenced this pull request Dec 20, 2021
* main:
  Updated environment lockfiles (SciTools#4458)
  remove asv package dependency (SciTools#4456)
  cube.aggregated_by output bounds (SciTools#4315)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants