Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive memory (100GB) used by dask-scheduler #4243

Open
pseudotensor opened this issue Nov 13, 2020 · 1 comment
Open

Massive memory (100GB) used by dask-scheduler #4243

pseudotensor opened this issue Nov 13, 2020 · 1 comment

Comments

@pseudotensor
Copy link

pseudotensor commented Nov 13, 2020

Cross-posting since seems now to be mostly a dask.distributed problem.

Maybe related:
#3898
dask/dask#3530
dask/dask#6762

See for code/repro: dmlc/xgboost#6388 (comment)

In very short order workers and the scheduler hit OOM killers due to them keep accumulating memory, including across cleanly completed python client code.

This issue basically makes using dask with, e.g. NVIDIA RAPIDS/xgboost not possible as a multi-GPU or multi-node solution.

@pseudotensor
Copy link
Author

I'm not confident it is a pure dask.distributed issue, so may continue to discuss in xgboost repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant