Add test variants with compressible data #714

crusaderky · 2023-03-13T12:29:30Z

Closes #696
Run the following tests twice, once with uncompressible data and another with highly compressible data, to display differences on the network stack in one use case rather than another:

test_anom_mean
test_climatic_mean (currently skipped)
test_vorticity
test_double_diff
test_dot_product
test_map_overlap_sample

This PR increases the overall runtime from 48min to 50min.

I've deliberately not touched test_basic_sum, which is always compressible, and test_rechunk_*, which are always uncompressible, because they have already a fair amount of permutations and I didn't feel that doubling everything (with the additional challenges in readability more than in runtime) would yield a benefit worth it.

crusaderky · 2023-03-14T12:16:18Z

I ran an A/B tests on distributed#7593 and I'm observing a very modest (5%), but consistent speedup in test_filter_then_average. The test uses data that is compressible at 37%. The other tests do not show any kind of change - including those running on data that is compressible at 99%. The reason is that the data compressible at 37% takes 140ms per chunk to compress, whereas identically sized data full of ones takes 14ms per chunk.

I need to scrap the current algorithm and synthetically create something similar to the zarr dataset.

crusaderky · 2023-03-15T13:04:13Z

Holy guacamole 😱

This reverts commit ff918e8.

crusaderky · 2023-03-15T16:52:15Z

This is ready for review and merge.
Discussion on the findings on dask/distributed#7655

milesgranger

LGTM. 👍

crusaderky · 2023-03-22T13:34:00Z

ty @milesgranger for review

crusaderky self-assigned this Mar 13, 2023

crusaderky marked this pull request as ready for review March 13, 2023 15:33

Test networking on compressible data

60a078b

crusaderky force-pushed the guido/compressible branch from 9961df3 to 60a078b Compare March 14, 2023 15:59

crusaderky added 3 commits March 14, 2023 16:11

Merge branch 'main' into guido/compressible

d696621

alembic fix

3b1c2c7

lint

45fae7d

crusaderky added 5 commits March 15, 2023 13:24

Bankruptcy for old spill tests

ef61e7e

A/B tests

ff918e8

Merge branch 'main' into guido/compressible

2b5d092

tweak

85d6516

Revert "A/B tests"

b2e7b7d

This reverts commit ff918e8.

crusaderky mentioned this pull request Mar 15, 2023

Compression slows down network comms dask/distributed#7655

Closed

milesgranger approved these changes Mar 22, 2023

View reviewed changes

crusaderky merged commit 0717e4b into main Mar 22, 2023

crusaderky deleted the guido/compressible branch March 22, 2023 13:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test variants with compressible data #714

Add test variants with compressible data #714

crusaderky commented Mar 13, 2023 •

edited

Loading

crusaderky commented Mar 14, 2023 •

edited

Loading

crusaderky commented Mar 15, 2023

crusaderky commented Mar 15, 2023

milesgranger left a comment

crusaderky commented Mar 22, 2023

Add test variants with compressible data #714

Add test variants with compressible data #714

Conversation

crusaderky commented Mar 13, 2023 • edited Loading

crusaderky commented Mar 14, 2023 • edited Loading

crusaderky commented Mar 15, 2023

crusaderky commented Mar 15, 2023

milesgranger left a comment

Choose a reason for hiding this comment

crusaderky commented Mar 22, 2023

crusaderky commented Mar 13, 2023 •

edited

Loading

crusaderky commented Mar 14, 2023 •

edited

Loading