Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make weighted aggregation lazy again #5341

Merged
merged 9 commits into from
Jun 26, 2023

Conversation

schlunma
Copy link
Contributor

@schlunma schlunma commented Jun 7, 2023

🚀 Pull Request

Description

#5084 broke lazy weighted aggregation by realizing all input arrays. This PR fixes that by

  • refactoring Weights so it does not subclass np.ndarray anymore. Instead, it now provides the attributes array and units. array (which is either lazy or non-lazy depending on the input) is then provided as weights keyword argument.
  • explicitly using the private keyword argument _weights_units when calling Aggregator.update_metadata (which in turn calls units_func). Apart from that, the weights units are not used/provide anywhere else.

Closes #5338


Consult Iris pull request check list

@schlunma
Copy link
Contributor Author

schlunma commented Jun 7, 2023

@bouweandela could you test if this solves #5338 for you?

@bouweandela
Copy link
Member

Thanks for fixing this so quickly @schlunma! Indeed it solves my issue.

Regarding the implementation: instead of implementing a custom _Weights class, it might be nicer to just pass the weights through to the analysis operator if they are an iris.coords.AncillaryVariable or a iris.coords.CellMeasure already, or convert them to an iris.coords.AncillaryVariable if they are not. But I'll leave that to other @SciTools/iris-devs to comment on.

@schlunma
Copy link
Contributor Author

schlunma commented Jun 9, 2023

Thanks for fixing this so quickly @schlunma! Indeed it solves my issue.

That's great to hear @bouweandela, thanks for testing!

Regarding the implementation: instead of implementing a custom _Weights class, it might be nicer to just pass the weights through to the analysis operator if they are an iris.coords.AncillaryVariable or a iris.coords.CellMeasure already, or convert them to an iris.coords.AncillaryVariable if they are not. But I'll leave that to other @SciTools/iris-devs to comment on.

I am not quite sure I understand what you mean here. The aggregators themselves only accept arrays as input, so I am not quite sure why I would convert them to iris.coords.AncillaryVariable. Of course the _Weights class is not strictly necessary anymore with these changes, but I think it's a nice way of converting the weights keyword argument (which can be array, cube, str, cell measure, etc.) to an array and units.

@codecov
Copy link

codecov bot commented Jun 10, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.04 🎉

Comparison is base (9e2cff5) 89.32% compared to head (e750d94) 89.37%.

❗ Current head e750d94 differs from pull request most recent head 66ef2aa. Consider uploading reports for the commit 66ef2aa to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           v3.6.x    #5341      +/-   ##
==========================================
+ Coverage   89.32%   89.37%   +0.04%     
==========================================
  Files          89       89              
  Lines       22398    22419      +21     
  Branches     5375     5378       +3     
==========================================
+ Hits        20008    20036      +28     
+ Misses       1640     1637       -3     
+ Partials      750      746       -4     
Impacted Files Coverage Δ
lib/iris/analysis/__init__.py 92.44% <100.00%> (ø)
lib/iris/cube.py 90.45% <100.00%> (-0.12%) ⬇️

... and 11 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@schlunma schlunma closed this Jun 14, 2023
@schlunma schlunma force-pushed the fix_lazy_weighted_aggregation branch from 408c861 to 0937e4f Compare June 14, 2023 15:14
@schlunma schlunma reopened this Jun 14, 2023
@schlunma schlunma force-pushed the fix_lazy_weighted_aggregation branch from 408c861 to e750d94 Compare June 14, 2023 15:22
@schlunma schlunma changed the base branch from main to v3.6.x June 14, 2023 15:23
@trexfeathers trexfeathers self-assigned this Jun 15, 2023
@bouweandela
Copy link
Member

I am not quite sure I understand what you mean here. The aggregators themselves only accept arrays as input, so I am not quite sure why I would convert them to iris.coords.AncillaryVariable. Of course the _Weights class is not strictly necessary anymore with these changes, but I think it's a nice way of converting the weights keyword argument (which can be array, cube, str, cell measure, etc.) to an array and units.

Thanks for explaining. I was confused because a _Weights instance is created but then it is not used anymore. For me, it would be easier to understand the code if the class was replaced by a single function.

@trexfeathers
Copy link
Contributor

Thanks for explaining. I was confused because a _Weights instance is created but then it is not used anymore. For me, it would be easier to understand the code if the class was replaced by a single function.

Just FYI I've been staring at the code for a few hours now trying to work out what I think the best pattern is. I agree that the use of a class is currently a bit weird, but I haven't settled on an alternative yet.

Copy link
Contributor

@trexfeathers trexfeathers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this took so long @schlunma. The underlying code-base made it quite confusing to see what is going on, and even once I had, I found it difficult to come up with suggestions to make it less confusing!

As well as my comments below, this needs a What's New entry.

lib/iris/cube.py Show resolved Hide resolved
lib/iris/cube.py Outdated Show resolved Hide resolved
lib/iris/analysis/__init__.py Show resolved Hide resolved
lib/iris/analysis/__init__.py Show resolved Hide resolved
lib/iris/tests/test_analysis.py Show resolved Hide resolved
lib/iris/tests/test_analysis.py Outdated Show resolved Hide resolved
lib/iris/tests/test_analysis.py Outdated Show resolved Hide resolved
lib/iris/tests/test_analysis.py Show resolved Hide resolved
@schlunma
Copy link
Contributor Author

schlunma commented Jun 19, 2023

Thanks for your feedback @trexfeathers, I am currently implementing you suggestions!

I noticed that my tests are failing locally now because of #5295. Turns out the fix for that (#5307) is not included in v3.6.x, but only in main. No idea why the CI machine is not picking that up though. EDIT: Tests are failing now as expected.

Would it be possible to include #5307 to v3.6.x? I think it essential to broadcast lazy arrays for this PR.

@schlunma schlunma force-pushed the fix_lazy_weighted_aggregation branch from e750d94 to fb466ba Compare June 19, 2023 09:22
@trexfeathers
Copy link
Contributor

Would it be possible to include #5307 to v3.6.x? I think it essential to broadcast lazy arrays for this PR.

@bjlittle thoughts?

@bjlittle
Copy link
Member

@bjlittle thoughts?

@trexfeathers If it helps this use-case cross the line, then it makes sense to me 👍

Technically, #5307 is adding a new feature rather than patching, so it's inclusion is somewhat blurring the lines of the patch/minor release... however, I think that we can defend it's inclusion here.

@SciTools/iris-devs Any other core dev opinions on this proposal?

@rcomer
Copy link
Member

rcomer commented Jun 19, 2023

If you wanted to be strict about the versioning, the other option would be to cut another bonus minor release.

The question for me is whether including #5307 is likely to break anyone’s workflow. Since #5295 shows that old version of broadcast_to_shape barfs when it gets a dask array, I think it’s unlikely that anyone is relying on that behaviour.

@trexfeathers
Copy link
Contributor

trexfeathers commented Jun 19, 2023

Thanks for your feedback @trexfeathers, I am currently implementing you suggestions!

I noticed that my tests are failing locally now because of #5295. Turns out the fix for that (#5307) is not included in v3.6.x, but only in main. No idea why the CI machine is not picking that up though. EDIT: Tests are failing now as expected.

Would it be possible to include #5307 to v3.6.x? I think it essential to broadcast lazy arrays for this PR.

As part of my review I have replicated @schlunma's findings about #5307 on my own machine.

I am happy for us to include this PR in a patch release, since it is clearly necessary for a fix.

If you wanted to be strict about the versioning, the other option would be to cut another bonus minor release.

I'm not especially keen for another bonus minor release. v3.6 was already a bonus release to help out ESMValTool ♥

@schlunma
Copy link
Contributor Author

One potential problem with the current implementation is that the multiplication by Unit('1') that is used when a regular array is used as weights might change the units in an undesired ways, e.g.,

>>> from cf_units import Unit
>>> Unit('kg m-2 s-1') * Unit('1')
Unit('m-2.kg.s-1')

Would it make sense to not multiply the units in this case to be strictly backwards-compatible?

@trexfeathers
Copy link
Contributor

One potential problem with the current implementation is that the multiplication by Unit('1') that is used when a regular array is used as weights might change the units in an undesired ways, e.g.,

>>> from cf_units import Unit
>>> Unit('kg m-2 s-1') * Unit('1')
Unit('m-2.kg.s-1')

Would it make sense to not multiply the units in this case to be strictly backwards-compatible?

So these two forms are functionally identical, but the second is in the UDUNITS-compliant form. We are hoping to move away from Iris modifying files to make them compliant, so I'm in favour of not multiplying the units if you can make that change. Thanks for spotting

@rcomer
Copy link
Member

rcomer commented Jun 20, 2023

I agree it would be confusing for users if just summing along a dimension changed their units (particular if an unweighted sum does not change the units, but I have not looked in detail at this change).

@trexfeathers
Copy link
Contributor

@schlunma: v3.6.x has now been updated (#5359), so ready for you to update this PR's HEAD.

@bjlittle bjlittle force-pushed the fix_lazy_weighted_aggregation branch from 9ccb7d2 to 66ef2aa Compare June 26, 2023 13:41
@trexfeathers trexfeathers merged commit e0df17c into SciTools:v3.6.x Jun 26, 2023
@trexfeathers
Copy link
Contributor

Thanks everyone for your hard work on this! @schlunma @rcomer @bouweandela @bjlittle

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: 🏁 Done - v3.6.1
Development

Successfully merging this pull request may close these issues.

weigths are always realized by the iris.analysis module
5 participants