Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommendation Requested / Observed mismatch #8119

Open
Vesyrak opened this issue Aug 21, 2023 · 2 comments · May be fixed by #8273
Open

Recommendation Requested / Observed mismatch #8119

Vesyrak opened this issue Aug 21, 2023 · 2 comments · May be fixed by #8273

Comments

@Vesyrak
Copy link

Vesyrak commented Aug 21, 2023

When the adaptive core gets a target value that recommends a scale-down, it always appears to take the first worker, as defined here:

not_yet_arrived = requested - observed

This is because requested and observed contain completely different data.
requested takes the name of the worker, which is indicated by the clusters as an incremental integer.

new_worker_name = self._new_worker_name(self._i)

However, the observed names, or the names the scheduler gets from the workers, appear to be the addresses of the workers.
This results in a mismatch as the sets are compared, and as there is no overlap, the adaptive core assumes that it is still awaiting some workers, and thus can kill the not-yet-arrived workers. This is counterproductive, as this causes the adaptive algorithm to kill based on ordering, rather than idle behaviour.
Screenshot 2023-08-21 at 10 25 37

Environment:

  • Dask version: 2023.3.0
  • Python version: 3.11
  • Operating System: Ubuntu 22.04 (docker)
  • Install method (conda, pip, source): pip
@Vesyrak
Copy link
Author

Vesyrak commented Aug 21, 2023

Noticed that this issue probably would propagate to other parts of the code, as seen here:

not_yet_launched = set(self.worker_spec) - {

@Vesyrak
Copy link
Author

Vesyrak commented Aug 21, 2023

Highly related issue: #4532

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant