Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark for climatology #1552

Merged
merged 8 commits into from
Oct 4, 2024
Merged

Add benchmark for climatology #1552

merged 8 commits into from
Oct 4, 2024

Conversation

hendrikmakait
Copy link
Member

Closes #1549

This benchmark doesn't even run at the small scale yet.

.expand_dims(hour=hours, dayofyear=daysofyear)
.assign_coords(hour=hours, dayofyear=daysofyear)
)
working = working.map_blocks(compute_hourly_climatology, template=template)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if people typically compute climatology with complex-enough calculations that warrant the use of map_blocks or typically use calculations that leverage Xarray's higher-level API.

Copy link
Contributor

@dcherian dcherian Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People use rechunk+map_blocks because it usually would not work otherwise. It's basically "manual fusion of tasks" to not overwhelm the scheduler. And convenient because you can use Xarray API in there.

I strongly recommend running a version of this without those steps. Ideally, these tricks should not be necessary for this calculation.

Notice that the workload rechunks back to pancakes... The chunking to pencils is purely to get this to work with dask+distributed

Copy link
Member Author

@hendrikmakait hendrikmakait Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure, what I'm hearing you say is: The computations people generally execute for calculating climatology, be it mean, std, quantile (or SEEP?) can be expressed in Xarray's high-level API and do not fundamentally require a custom function mapped over all blocks.

In that case, I fully agree that we should avoid map_blocks but express these calculations in Xarray. We should probably benchmark both a computation based on a decomposable aggregation like mean and one based on a holistic aggregation like quantile.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The computations people generally execute for calculating climatology,

Yes. generally. The trick is to look at the function being mapped: in this case it's all Xarray ops and no for loops. That indicates the map_blocks is a workaround for some Xarray and/or dask/distributed inefficiency.

We should probably benchmark both a computation based on a decomposable aggregation like mean and one based on a holistic aggregation like quantile.

Yes totally agree with adding quantile. Dask will force a rechunk :)

PS: What is SEEP?

PPS: I notice this is based on a Xarray-beam workload, it may be that rechunk+map_blocks is the only way to express this in that framework.

Copy link
Member Author

@hendrikmakait hendrikmakait Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. generally. The trick is to look at the function being mapped: in this case it's all Xarray ops and no for loops. That indicates the map_blocks is a workaround for some Xarray and/or dask/distributed inefficiency.

Thanks for the clarification!

PS: What is SEEP?

All I can offer is https://github.com/google-research/weatherbench2/blob/47d72575cf5e99383a09bed19ba989b718d5fe30/scripts/compute_climatology.py#L147-L175. Maybe @shoyer can elaborate on SEEP and how common similar calculations are? Just looking at the code, this can probably be expressed in Xarray's high-level API as well.

PPS: I notice this is based on a Xarray-beam workload, it may be that rechunk+map_blocks is the only way to express this in that framework.

Yes, that's what it's based on, but not what we should be limited by.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, less experienced users usually try without rechunk+map_blocks first, and then grudgingly resort to that. It'd be nice to model that interaction and improve things if possible.

Yes, rechunk+map_blocks is definitely not the obvious solution. It would be quite nice if we could automatically optimize high-level Xarray code to use a rechunk + map_blocks style implementation when it will be more efficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we won't beat hand-optimized code. However, a major goal of these benchmarks is to understand how we can improve the performance of code that relies on high-level APIs to improve the end-user experience. We all seem to agree that it would be great if users wouldn't have to worry about rewriting their high-level code in a rechunk+map_blocks-style code if it can be avoided. So, wherever possible, I think we should try and benchmark code without optimization and analyze where and how this falls apart.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also add that we want rechunk+map_blocks to always succeed, even if slow, so that it can be a dependable fallback, particularly when the algorithms get more complicated. This would be a major improvement over the dask.array experience today.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcherian: Could you elaborate on that last comment? In what situations does that combination not succeed today?

I can imagine this failing if the block sizes or the temporary memory footprint of the mapped function end up being too large. However, this seems like a situation that is beyond the control of higher-level APIs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will fail whenever rechunking fails... agree that after that, the blockwise is pretty stable, which is why people use this solution in combination with some kind of batching to avoid the rechunking failure. I don't have an example at hand. This is anecdotal experience from looking at lot of non-expert code at NCAR.

@hendrikmakait hendrikmakait marked this pull request as ready for review October 1, 2024 14:23
@hendrikmakait hendrikmakait changed the title [WIP] Add benchmark for climatology Add benchmark for climatology Oct 4, 2024
@hendrikmakait
Copy link
Member Author

This should be good for a review now. (I expect that we will iterate on this once we have improved on some things within Xarray/Dask.)

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrbourbeau jrbourbeau merged commit d2ec39f into main Oct 4, 2024
5 checks passed
@jrbourbeau jrbourbeau deleted the climatology-benchmark branch October 4, 2024 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Climatology
4 participants