Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lifetime parameters #417

Closed
wshanks opened this issue Apr 24, 2020 · 6 comments
Closed

Lifetime parameters #417

wshanks opened this issue Apr 24, 2020 · 6 comments
Labels
help wanted Extra attention is needed
Milestone

Comments

@wshanks
Copy link

wshanks commented Apr 24, 2020

When first trying to use dask_jobqueue, I was used to using LocalCluster from distributed where **kwargs is really **worker_kwargs and was confused why some options were making it through to the worker but others were not like lifetime. Stepping through the code, I saw that **kwargs for JobCluster were propagated to the options key of the worker argument to SpecCluster and then on to the **kwargs of dask_jobqueue.core.Job where it was ignored. I wonder if lifetime, lifetime-stagger, and lifetime-restart should be handled by dask_jobqueue.core.Job like other command_args are?

Also, I wonder if there should be a warning for any unhandled **kwargs that make it to dask_jobqueue.core.Job, since the user (like me) is probably confused if they are sending arguments that are making it that far down (I don't think arguments can make it that far in **kwargs unless they were not used higher up?). The thing that it took me a while to realize was that the dask_jobqueue.core.Job subclasses kind of are the dask Worker as far as JobCluster is concerned. Renaming extra to dask_worker_additional_args as suggested in #323 would help some with this confusion.

@lesteve
Copy link
Member

lesteve commented Apr 27, 2020

Not sure if you have seen but #398 is an attempt to have an error for ignored parameters.

Not much time currently to work on this unfortunately ...

@guillaumeeb
Copy link
Member

Ah, I was sure @lesteve mentionned something like this in a issue not so long ago! Thanks @lesteve.

@wshanks
Copy link
Author

wshanks commented Apr 28, 2020

Ah, okay, #386 is the same as what I mentioned in my second paragraph above.

@guillaumeeb
Copy link
Member

I wonder if lifetime, lifetime-stagger, and lifetime-restart should be handled by dask_jobqueue.core.Job like other command_args are?

It is true that lifetime parameters are important to dask-jobqueue. They could be added for this reason. But maybe we should define some policy, either:

  • Remove all kwargs that are not related to queuing system. All worker kwargs rely on extra.
  • Remove all kwargs not related to queuing system, and use a wildcard **kwargs that we propagate.
  • Add worker kwargs we believe are important for dask-jobqueue, and rely on extra for the others (the closer to the actual implementation).

If there are some thoughts here from others...

@andersy005 andersy005 added the help wanted Extra attention is needed label Oct 15, 2021
@guillaumeeb guillaumeeb added this to the 0.8.1 milestone Aug 30, 2022
@guillaumeeb
Copy link
Member

First a status of declared worker kwargs:

  • protocol, security and interface are used both for Scheduler and Worker, so they are declared in JobQueueCluster and propagated to Job implementations.
  • Every non used kwarg from JobQueuecluster are propagated to Job implementations, but will generate errors if not used there.
  • cores, memory are used to compute nthreads, nworkers, memory-limit Worker args, but are not directly linked to Worker kwargs as they are also used for Job directives.
  • processes is used to compute nthreads, and to get nworkers Worker kwarg.
  • name, nanny, death_timeout, local_directory are used to add associated CLI Worker args.

Then, a clarification:

Remove all kwargs not related to queuing system, and use a wildcard **kwargs that we propagate.

We do not use Nanny or Worker class, and won't do it. With SpecCluster and the use of Job, it is not possible to do that easily. So we can forget this solution.

So that leaves us with:

  • Remove all kwargs that are not related to queuing system. All worker kwargs rely on worker_extra_args. This means removing
    name, nanny, death_timeout, local_directory. We still need processes and memory I guess.
  • Add worker kwargs we believe are important for dask-jobqueue, and rely on extra for the others (the closer to the actual implementation): are lifetime, lifetime-stagger, and lifetime-restart important enough? What other keywords?

@guillaumeeb
Copy link
Member

My opinion here is to stay with the current situation. Maybe we have already too many optional kwargs for Workers, but I don't want to remove one of them, as they are really important.

The only other occurrences of extra kwargs in the docs are the lifetime* ones, and resources. I thinks they are advanced use cases, and can remain outside of Job implementation constructor kwargs.

I can propose a rule to decide whether to add a Worker kwarg to our Job kwarg:

If a Worker kwarg default needs to be updated for a good behavior of Dask on a job queueing system or if this kwarg needs to be coherent between Scheduler and Workers, then it must be declared as a Job kwarg.

death_timeout, local_directory, nthreads, memory-limit, nworkers (and so related cores, memory and processes) clearly fall into this category. interface also, and as protocol and security are needed by both Scheduler and Workers. name is an exception, it is required for SpecCluster mechanism. That leaves nanny which could be removed, but is also of a high importance in some setup.

Others proposed here or in the doc do not match the rule above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants