Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky tests: assert False, (bad_thread, call_stacks) - Worker executor thread still running after Nanny.kill #6796

Closed
gjoseph92 opened this issue Jul 26, 2022 · 3 comments
Assignees
Labels
flaky test Intermittent failures on CI.

Comments

@gjoseph92
Copy link
Collaborator

In these tests, a Nanny is killed (either via Nanny.kill, or Nanny.restart, which calls kill internally), like:

# Kill the only worker.
[worker_id] = cluster.workers
await cluster.workers[worker_id].kill()

It appears that in some cases the worker's ThreadPoolExecutor isn't being terminated? That's confusing though, since the worker should be in a subprocess, so its threads wouldn't even be visible to the parent process?

I have a feeling #6427 would address this (or at least raise a meaningful error instead of this thread leakage?) because Nanny.kill doesn't actually wait for the subprocess to terminate.

@gjoseph92
Copy link
Collaborator Author

@hendrikmakait do you think this was fixed in #6817? I'm still not understanding the mechanism for how (I know test_localcluster_get_client was leaking a thread, but why did that show up in the other tests instead?).

@hendrikmakait
Copy link
Member

I looks like we stop seeing this flake, so from my perspective the symptom of flaking has been resolved. Let's add a follow-up issue to investigate the causal mechanism. It feels like there may be a more systemic issue in there, or at the very least, we might get a better understanding for unintended side effects of messy tests.

@hendrikmakait
Copy link
Member

hendrikmakait commented Aug 3, 2022

Closed as these tests stopped being flaky after #6817 was merged. Given known issues with the client (#5901, #6527), we have decided against further investigation at this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky test Intermittent failures on CI.
Projects
None yet
Development

No branches or pull requests

2 participants