Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn (and eventually raise) when client.scatter is used with Active Memory Manager enabled #8921

Open
phofl opened this issue Nov 4, 2024 · 3 comments

Comments

@phofl
Copy link
Collaborator

phofl commented Nov 4, 2024

Using scatter is generally not a good idea anymore and doesn't have any effect if the active memory manager is enabled. People are frequently running into this or are using scatter anyway, which is a bad UX and confusing.

We should raise a warning if scatter is used and point people to delayed and probably raise at some point later or get rid of the method completely.

@fjetter
Copy link
Member

fjetter commented Nov 5, 2024

Using scatter is generally not a good idea anymore and doesn't have any effect if the active memory manager is enabled.

That's only partially true. What doesn't have an effect any more is scatter(..., broadcast=True

@phofl
Copy link
Collaborator Author

phofl commented Nov 5, 2024

Good point, I mixed that up....

Is delayed generally better or is that incorrect?

@fjetter
Copy link
Member

fjetter commented Nov 6, 2024

Is delayed generally better or is that incorrect?

In 9 out of 10 times it is better. The difference between the two approaches is that scatter can take a direct path to the worker instead of proxying through the scheduler. At least if the network configuration allows such things.

Even if scatter proxies over the scheduler, the scheduler just forwards the data directly and doesn't store a copy. This matters if the data is actually large since the scheduler has to hold the delayed task in memory until it is completed.
This also means that delayed is more robust to failures, of course.

In the end its a tradeoff between slightly better performance and resilience+higher memory usage on the scheduler.

The safe but slightly more costly approach is delayed. Most end users will likely not be able to differentiate this properly and judge the risks/costs properly so the recommendation to use delayed (or client.submit) is certainly good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants