-
-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes for Adaptive #63
Changes from 4 commits
7c56b1d
6b35688
507be82
99d0f1f
78a22ff
62e050c
d121180
2329bfe
92eaf4e
9084a35
4776892
ef62f59
a4e007a
c19e4da
5d5fd85
115b0c1
cde3ca4
75f2c6a
66db52d
ea7d56d
a6d31d2
25965c0
ab4363a
604a563
1e0455e
ace37ad
56e2990
914244c
359be59
1441634
0bf53d1
cc2628f
90dd730
9fe2178
18dfe31
d98b141
b303275
13e5dc3
b4877ad
667369e
bf99d29
292b595
627f873
f2b2a92
1f0dc71
a1b102d
c988f1e
619047f
8db65eb
ef16298
3803918
aad58d4
fa1b717
aeea2e5
b93a7c4
0a2e304
1a1fe75
8c32872
9c43f43
a02abc8
ee89f20
4abcea1
0c7425a
5d4552d
8a150c9
ca0c727
ce007df
c23ce7c
7618467
d5e42b3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,7 @@ | ||
from __future__ import absolute_import, division, print_function | ||
|
||
import logging | ||
import math | ||
import os | ||
import shlex | ||
import socket | ||
|
@@ -133,8 +136,11 @@ def __init__(self, | |
if memory is not None: | ||
self._command_template += " --memory-limit %s" % memory | ||
if name is not None: | ||
self._command_template += " --name %s" % name | ||
self._command_template += "-%(n)d" # Keep %(n) to be replaced later | ||
# worker names follow this template: {NAME}-{JOB_ID}-{WORKER_NUM} | ||
self._command_template += " --name %s" % name # e.g. "dask-worker" | ||
# Keep %(n) to be replaced later (worker id on this job) | ||
# ${JOB_ID} is an environment variable describing this job | ||
self._command_template += "-${JOB_ID}-%(n)d" | ||
if death_timeout is not None: | ||
self._command_template += " --death-timeout %s" % death_timeout | ||
if local_directory is not None: | ||
|
@@ -161,7 +167,8 @@ def job_file(self): | |
def start_workers(self, n=1): | ||
""" Start workers and point them to our local scheduler """ | ||
workers = [] | ||
for _ in range(n): | ||
num_jobs = min(1, math.ceil(n / self.worker_processes)) | ||
for _ in range(num_jobs): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a breaking change I want to make sure everyone is aware of. The current behavior for a hypothetical setup that includes 10 workers per job would be: cluster.start_workers(1) ...and get 1 job and 10 workers. I'd like to change this so that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Historically start_workers was a semi-convention between a few projects. This has decayed, so I have no strong thoughts here. I do think that we need to be consistent on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will this really help adaptive? Would'nt there still be a problem with starting the worker in a grouped manner? With your example, calling But this may be well handled by adaptive, I don't know. In this case, this may not be needed to do this breaking change? |
||
with self.job_file() as fn: | ||
out = self._call(shlex.split(self.submit_command) + [fn]) | ||
job = self._job_id_from_submit_output(out.decode()) | ||
|
@@ -196,12 +203,12 @@ def _calls(self, cmds): | |
Also logs any stderr information | ||
""" | ||
logger.debug("Submitting the following calls to command line") | ||
procs = [] | ||
for cmd in cmds: | ||
logger.debug(' '.join(cmd)) | ||
procs = [subprocess.Popen(cmd, | ||
stdout=subprocess.PIPE, | ||
stderr=subprocess.PIPE) | ||
for cmd in cmds] | ||
procs.append(subprocess.Popen(cmd, | ||
stdout=subprocess.PIPE, | ||
stderr=subprocess.PIPE)) | ||
|
||
result = [] | ||
for proc in procs: | ||
|
@@ -232,10 +239,13 @@ def scale_up(self, n, **kwargs): | |
|
||
def scale_down(self, workers): | ||
''' Close the workers with the given addresses ''' | ||
if isinstance(workers, dict): | ||
names = {v['name'] for v in workers.values()} | ||
job_ids = {name.split('-')[-2] for name in names} | ||
self.stop_workers(job_ids) | ||
if not isinstance(workers, dict): | ||
raise ValueError( | ||
'Expected dictionary of workers, got %s' % type(workers)) | ||
names = {v['name'] for v in workers.values()} | ||
# This will close down the full group of workers | ||
job_ids = {name.split('-')[-2] for name in names} | ||
self.stop_workers(job_ids) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm thinking there is a better way to do this. The current behavior to scale down removes the entire job from the system. So if @mrocklin - Would it make sense to add logic to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The key= parameter to https://github.com/dask/distributed/blob/master/distributed/scheduler.py#L2525-L2548 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Glad to see that grouped worker is handled in adaptive! Another comment here, not linked to this PR, is that I find the |
||
|
||
def __enter__(self): | ||
return self | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
from __future__ import absolute_import, division, print_function | ||
|
||
import logging | ||
|
||
import dask | ||
|
@@ -52,8 +54,7 @@ def __init__(self, | |
|
||
super(SGECluster, self).__init__(**kwargs) | ||
|
||
header_lines = ['#!/bin/bash'] | ||
|
||
header_lines = ['#!/usr/bin/env bash'] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So you don't have a solution for propagating JOB_ID var in sge script? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good catch. I thought I did but I'll have to sort it out. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I take that back. SGE uses |
||
if self.name is not None: | ||
header_lines.append('#$ -N %(name)s') | ||
if queue is not None: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why using
min
here? This would always lead to only one job started if I'm not mistaken.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. I've removed this.