-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_spilling flaky #280
Comments
From what I have seen so far, those flakes seem to occur in #235, not |
Hmm, that's a good point. I'll dig a bit deeper |
My guess is that the reused cluster fixture is either busy cleaning up after running the first spilling test or getting somehow corrupted. |
I agree, I wonder if dask/distributed#6944 would help here as well |
This test is sensitive to disk space, no? I wonder if |
It is; the disk size has been set through trial-and-error, I guess it won't hurt to just up it by another couple of GiBs. |
I believe that the platform team recommends using at least a |
Yes and no, by increasing the data size we'd stick with the original "goal" of writing 10x the memory size to disk but I'm wondering if we would lose a lot if we just stuck to the same size or something that only accounts for the additional 4 GiB of memory per worker. After all, writing to disk is so slow. |
I'm trying a |
tests/stability/test_spill.py::test_spilling
, introduced in #229, has been quite flaky, failing in ~25% of it's test runs (example). Since it is tested across a wide test matrix, this means that most workflows fail ontest_spill
.For the most part, it seems that they are failing in the
client
setup fixture during thewait_for_workers()
call. It seems that the high memory usage from that test is making cluster restarts pretty unreliable.cc @hendrikmakait
The text was updated successfully, but these errors were encountered: