test_spilling flaky #280

ian-r-rose · 2022-08-25T22:35:38Z

tests/stability/test_spill.py::test_spilling, introduced in #229, has been quite flaky, failing in ~25% of it's test runs (example). Since it is tested across a wide test matrix, this means that most workflows fail on test_spill.

For the most part, it seems that they are failing in the client setup fixture during the wait_for_workers() call. It seems that the high memory usage from that test is making cluster restarts pretty unreliable.

cc @hendrikmakait

The text was updated successfully, but these errors were encountered:

hendrikmakait · 2022-08-26T11:44:16Z

From what I have seen so far, those flakes seem to occur in #235, not main, is that correct?

ian-r-rose · 2022-08-26T15:27:54Z

Hmm, that's a good point. I'll dig a bit deeper

hendrikmakait · 2022-08-26T16:07:32Z

My guess is that the reused cluster fixture is either busy cleaning up after running the first spilling test or getting somehow corrupted.

ian-r-rose · 2022-08-26T16:09:51Z

My guess is that the cluster is either busy cleaning up after running the first spilling test or getting somehow corrupted.

I agree, I wonder if dask/distributed#6944 would help here as well

ian-r-rose · 2022-08-26T19:29:11Z

This test is sensitive to disk space, no? I wonder if package_sync uses more disk and changes the performance characteristics here.

hendrikmakait · 2022-08-26T19:36:02Z

It is; the disk size has been set through trial-and-error, I guess it won't hurt to just up it by another couple of GiBs.

ian-r-rose · 2022-08-26T19:37:55Z

I believe that the platform team recommends using at least a t3.large for package_sync. I'd imagine if we did this we'd want to change the size of the allocated data, correct?

hendrikmakait · 2022-08-26T19:41:47Z

Yes and no, by increasing the data size we'd stick with the original "goal" of writing 10x the memory size to disk but I'm wondering if we would lose a lot if we just stuck to the same size or something that only accounts for the additional 4 GiB of memory per worker. After all, writing to disk is so slow.

ian-r-rose · 2022-08-26T19:47:54Z

I'm trying a t3.large over here. If that improves things, would you have any objections, or would that harm the integrity of the test? :)

ian-r-rose mentioned this issue Aug 26, 2022

Integration tests for spilling #229

Merged

ian-r-rose mentioned this issue Sep 15, 2022

Package Sync #235

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test_spilling flaky #280

test_spilling flaky #280

ian-r-rose commented Aug 25, 2022

hendrikmakait commented Aug 26, 2022

ian-r-rose commented Aug 26, 2022

hendrikmakait commented Aug 26, 2022 •

edited

Loading

ian-r-rose commented Aug 26, 2022

ian-r-rose commented Aug 26, 2022

hendrikmakait commented Aug 26, 2022

ian-r-rose commented Aug 26, 2022

hendrikmakait commented Aug 26, 2022

ian-r-rose commented Aug 26, 2022

test_spilling flaky #280

test_spilling flaky #280

Comments

ian-r-rose commented Aug 25, 2022

hendrikmakait commented Aug 26, 2022

ian-r-rose commented Aug 26, 2022

hendrikmakait commented Aug 26, 2022 • edited Loading

ian-r-rose commented Aug 26, 2022

ian-r-rose commented Aug 26, 2022

hendrikmakait commented Aug 26, 2022

ian-r-rose commented Aug 26, 2022

hendrikmakait commented Aug 26, 2022

ian-r-rose commented Aug 26, 2022

hendrikmakait commented Aug 26, 2022 •

edited

Loading