Geospatial-type workload showing two common scheduler failures at once #6571

TomNicholas · 2022-06-13T23:07:57Z

What happened:

I have a graph for a very common physical science use case which currently performs extremely poorly on the distributed scheduler.

I would like to:

(a) Get this running somehow as soon as I can,
(b) Propose this as a benchmark example for the distributed scheduler.

This workload is the generalization @gjoseph92 mentioned in #6360 (comment).

Why this example is interesting:

After talking to @gjoseph92 I believe this example is useful because whilst minimal it still demonstrates three big scheduling consideration / problems at once:

"Root task overproduction" (see Ease memory pressure by deprioritizing root tasks? #6360),
"Widely-shared dependencies" (see Ignore widely-shared dependencies in decide_worker #5325),
"Root task co-location". (This was fixed in Co-assign root-ish tasks #4967 but any solution to (1) and (2) should retain this feature, else this example will still behave poorly).

By minorly adjusting the full example one can easily test the performance of any combination of these three considerations.

It also represents a simple to describe and commonly-desired geospatial-type calculation. I expect that making this example work better would immediately improve the distributed scheduler's performance for very many pangeo and xarray-type problems, perhaps even for the overall majority of users in those communities. (I think @rabernat and @jbusecke would concur.) Some performance issues raised by pangeo users in previous issues (especially #2602) can be much more easily replicated using this type of workload.

What you expected to happen:

I expected this example to perform well with the distributed scheduler: this workload came about after an extensive refactor of xGCM to pipe all our operations through xarray.apply_ufunc, and the resulting graph can't really be simplified much further. Prior to raising this I also messed around a lot with trying to fuse task chains, but that doesn't really help.

Minimal Complete Verifiable Example:

The essential part of the problem is basically just this:

def vort(u, v, dx, dy):
    return (dx * u - dy * v)

vort(u, v, dx, dy).mean()

where u and v have dimensions (x, y, time), and are chunked along time, whilst dx and dy have dimensions (x, y), and are not chunked, so get broadcast against all the time chunks.

I've gone into much more detail in this notebook though.

Anything else we need to know?:

The example is pure-dask, only inspired by xGCM (which uses xarray to perform basically all its dask operations).

How can I get this running in the meantime?

Before a full solution to (1) and (2) above are available, is there a way I can get this running smoothly in the meantime?

My ideas include:

Inlining the creation of dx and dy to remove all the cross-dependencies causing (2),
Scattering the metrics data onto all workers to remove all the cross-dependencies causing (2).
Performing the calculation on subsets of the data using map_blocks instead.

But any input would be appreciated.

The text was updated successfully, but these errors were encountered:

phofl · 2024-08-20T11:42:18Z

This works fine and in constant memory now, so closing this one

This was referenced Jun 13, 2022

Integration test: common physical science workload coiled/benchmarks#174

Closed

Task co-assignment logic is worst-case for binary operations like a + b #6597

Open

This was referenced Jun 23, 2022

Means of zarr arrays cause a memory overload in dask workers pydata/xarray#6709

Closed

Design and prototype for root-ish task deprioritization by withholding tasks on the scheduler #6560

Closed

fjetter mentioned this issue Jun 24, 2022

Initial set of automated performance benchmarks (non-H2O) coiled/benchmarks#191

Closed

TomNicholas mentioned this issue Jul 15, 2022

How to fuse / scatter / whatever dask.array objects dask/dask#9277

Open

gjoseph92 mentioned this issue Aug 15, 2022

Benchmarks for some common workloads coiled/benchmarks#243

Merged

gjoseph92 mentioned this issue Sep 1, 2022

Performance regressions after queuing PR coiled/benchmarks#295

Open

TomNicholas mentioned this issue Sep 19, 2022

[Fail case] Almost-blockwise weighted arithmetic vorticity calculation pangeo-data/distributed-array-examples#1

Open

gjoseph92 mentioned this issue Oct 12, 2022

an example that shows the need for memory backpressure #2602

Closed

phofl closed this as completed Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Geospatial-type workload showing two common scheduler failures at once #6571

Geospatial-type workload showing two common scheduler failures at once #6571

TomNicholas commented Jun 13, 2022

phofl commented Aug 20, 2024

Geospatial-type workload showing two common scheduler failures at once #6571

Geospatial-type workload showing two common scheduler failures at once #6571

Comments

TomNicholas commented Jun 13, 2022

phofl commented Aug 20, 2024