-
-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a nested folder for storing jobqueue log file, use separate files… #145
Conversation
… for each job with LSF and SLURM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, I have not a super huge fan of the cwd
approach ... I don't really have a better suggestion.
You could imagine having a "degraded" behaviour for PBSCluster (essentially you can not specify a log_directory
).
Slightly unrelated question, is PBSCluster supposed to work on a Torque cluster? It seems like Torque supports variable expansion according to http://docs.adaptivecomputing.com/torque/4-0-2/Content/topics/commands/qsub.htm.
dask_jobqueue/core.py
Outdated
if log_directory is not None: | ||
self.log_directory = log_directory | ||
if not os.path.exists(self.log_directory): | ||
os.mkdir(self.log_directory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly better to use os.makedirs
(i.e. the same as mkdir -p
)
dask_jobqueue/core.py
Outdated
@@ -290,7 +298,8 @@ def job_file(self): | |||
yield fn | |||
|
|||
def _submit_job(self, script_filename): | |||
return self._call(shlex.split(self.submit_command) + [script_filename]) | |||
return self._call(shlex.split(self.submit_command) + [script_filename], | |||
cwd=self.log_directory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, this may have unintended side-effects, although I am not 100% sure ... for example if I specify some relative path (e.g. for data files) in my notebook they may not be found in the dask-worker process.
Is it not possible to pass a folder to -e
and -o
(SGE allows this but maybe this is very specific to SGE ...)?
I guess if I understand #141 (comment), there is no way to have variable expansion in PBS directive. I googled a bit and found http://community.pbspro.org/t/pbs-jobid-variable/176 which does not seem to leave too much hope ... I have seen "solutions" that used qalter
after the submission e.g. this maybe that's an option but that seems quite complicated ...
Just curious, so what do people do with PBSPro? They don't set the stdout/stderr file, they remember to change it each time with a different name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right, this may not be the best approach...
You've understood the problem with PBS, using -e or -o is possible but quite limited. Default output file name is $JOB_NAME.o$JOB_ID. So usually I just leave this and don't set anything.
Another solution is to just specify a folder. In this case PBS will fill it with $JOB_ID.OUT files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another solution is to just specify a folder. In this case PBS will fill it with $JOB_ID.OUT files.
If a specifying a folder is possible I think I would just do that for PBS (and SGE as well).
@@ -79,3 +81,10 @@ def test_job_id_error_handling(Cluster): | |||
return_string = 'Job <12345> submited to <normal>.' | |||
cluster.job_id_regexp = r'(\d+)' | |||
cluster._job_id_from_submit_output(return_string) | |||
|
|||
|
|||
def test_log_directory(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice pytest trick: you can use the tmpdir
fixture. This gives you a unique folder on each run but pytest keeps it after the execution, allowing you to look at it in case something goes wrong. pytest keeps the last N (maybe 5) tmpdir folders IIRC.
def test_log_directory(tmpdir):
with PBSCluster(..., log_directory=tmpdir.strpath as cluster:
...
dask_jobqueue/slurm.py
Outdated
@@ -73,8 +73,8 @@ def __init__(self, queue=None, project=None, walltime=None, job_cpu=None, job_me | |||
# SLURM header build | |||
if self.name is not None: | |||
header_lines.append('#SBATCH -J %s' % self.name) | |||
header_lines.append('#SBATCH -e %s.err' % self.name) | |||
header_lines.append('#SBATCH -o %s.out' % self.name) | |||
header_lines.append('#SBATCH -e %s-%%j.err' % self.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, I have wondered before, why not keep the default name? Is the default name not something very similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was my first solution too, but I found that Slurm uses slurm-$JOB_ID.out as default. This is not too bad, but it misses the job name... So I'm hesitating here.
So you're right, maybe cwd is not the best choice, and I've missed some degraded cases too. I will try to fix that. For Torque vs PBS, they were the same a long time ago, but they've diverged so are not fully compatible now. It maybe needed to implement a TorqueCluster at some point. |
So I used the Should we leave it to user setting |
I'm gonna merge this one soon. |
I just saw this now, very nice to see this merged! |
Thanks, I had doubts with this one, but it's really nice to not have undreds of output files inside my notebook folder! |
… for each job with LSF and SLURM
Closes #141