Doc on handling worker with walltime #481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

guillaumeeb merged 3 commits into dask:master from guillaumeeb:update_docs_handling_workers

Jan 24, 2021

Member

guillaumeeb commented Jan 23, 2021

Finally, a little contribution from me, and a doc fix to a long standing issue.

Fixes #122.

guillaumeeb added 2 commits

January 23, 2021 14:47


          Doc on handling worker with walltime

a2e4198


          Improving inlining

0fb0a56

mivade reviewed

View reviewed changes

mivade left a comment

Thanks, this looks like great additional documentation! I've pointed out some typos and some suggestions to clarify the language a bit, but otherwise this looks good!

docs/source/advanced-tips-and-tricks.rst Outdated

+              - when you don't have a lot of room on you HPC platform and have only a few workers at a time (less than what you were hopping for when using scale or adapt). These workers will be killed (and others started) before you workload ends.
+              - when you really don't know how long your workload will take: all your workers could be killed before reaching the end. In this case, you'll want to use adaptive clusters so that Dask ensures some workers are always up.
+              If you don't set the proper parameters, you'll run into KilleWorker exceptions in those two cases.

mivade Jan 23, 2021

Typo in the exception.

docs/source/advanced-tips-and-tricks.rst Outdated


		If you don't set the proper parameters, you'll run into KilleWorker exceptions in those two cases.

		The solution to this problem is to tell Dask up front that the workers have a finit life time:

mivade Jan 23, 2021

Typo: finit -> finite. Similarly lifetime is usually spelled as a single word.

docs/source/advanced-tips-and-tricks.rst Outdated


		The solution to this problem is to tell Dask up front that the workers have a finit life time:

		- Use `--lifetime` worker option. This will enables infinite workloads using adaptive. Workers will be properly shut down before the scheduling system kills them, and all their states moved.

mivade Jan 23, 2021

enables -> enable

docs/source/advanced-tips-and-tricks.rst Outdated


		In dask-jobqueue, every worker processes run inside a job, and all jobs have a time limit in job queueing systems.

mivade Jan 23, 2021

Should be "every worker process runs..."

docs/source/advanced-tips-and-tricks.rst Outdated

+              In dask-jobqueue, every worker processes run inside a job, and all jobs have a time limit in job queueing systems.
+              Reaching walltime can be troublesome in several cases:
+              - when you don't have a lot of room on you HPC platform and have only a few workers at a time (less than what you were hopping for when using scale or adapt). These workers will be killed (and others started) before you workload ends.

mivade Jan 23, 2021

hopping -> hoping and "before you workload" -> "before your workload"

docs/source/advanced-tips-and-tricks.rst Outdated

+              The solution to this problem is to tell Dask up front that the workers have a finit life time:
+              - Use `--lifetime` worker option. This will enables infinite workloads using adaptive. Workers will be properly shut down before the scheduling system kills them, and all their states moved.
+              - Use `--lifetime-stagger` when dealing with many workers (say > 20): this will allow to avoid workers all terminating at the same time, and so to ease rebalancing tasks and scheduling burden.

mivade Jan 23, 2021

"this will allow to avoid workers all" -> "this will prevent workers from"

"and so to ease" -> "and so ease" or (probably better) "thus"

docs/source/advanced-tips-and-tricks.rst Outdated

		cluster.adapt(minimum=0, maximum=200)


		Here is an example of a workflow taking advantage of this, if you wan't to give it a try or adapt it to your use case:

mivade Jan 23, 2021

wan't -> want


          Fix typos

dc41575

Member Author

guillaumeeb commented Jan 23, 2021

Many thanks @mivade! I need to practice my english...

andersy005 approved these changes

View reviewed changes

Member

andersy005 left a comment

Thank you for putting this together, @guillaumeeb!

andersy005 added the documentation label

guillaumeeb merged commit 69f27ac into dask:master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels