WIP: Use regex for job-id lookup #131

willirath · 2018-08-22T09:26:27Z

There's pbs schedulers returning with more verbose returns like: "Request 123456.asdfqwer submitted to queue: standard."

~~f0566c8 should be fully compatible to the former version.~~
~~It's probably a good thing to make the regex configurable. I'll go ahead and add this as well.~~
Moving the regex matching in core.py
Making it a static string in the code, non configurable, with a value of r'\d+'
Deleting all _job_id_from_submit_output from JobQueueCluster subclasses
Adding tests
Updating trouble shooting section of the docs

This also captures more general output like "Request 123456.asdfqwer submitted to queue: standard."

guillaumeeb

This looks really good, thanks !

Out of curiosity, what PBs versions do you work with?

Other than that, this would be a good idea to generalize this behaviour to other scheduler, I believe it could prove really useful.

We could even extend it to the PBS header, as what is done in https://github.com/jupyterhub/batchspawner/blob/master/batchspawner/batchspawner.py#L467. But this is still another matter.

I would open some issues to discuss this. Thanks for this PR @willirath.

willirath · 2018-08-22T12:03:00Z

Out of curiosity, what PBs versions do you work with?

It's NQSII. I'm running this on an NEC Linux cluster:

$ qstat -V
NQSII CUI Version R03.00 / API Version R03.00 (linux)

willirath · 2018-08-22T12:03:45Z

It's NQSII. I'm running this on an NEC Linux cluster:

Once I have a fully working setup, I can add something to the example configs.

lesteve · 2018-08-22T12:26:25Z

Not familiar with PBS but is there no command line argument you can pass to PBS qsub that just outputs the job-id (there was one for SGE qsub which is why I am asking).

I am not convinced about making the regex configurable, but maybe I am grossly underestimating the variability of possible qsub outputs (maybe each cluster defines its own qsub wrapper which prints more information to the stdout, or maybe it depends on the PBS version)

guillaumeeb · 2018-08-23T06:39:43Z

@willirath, will you be OK to modify your PR according to what has been discussed in #132:

Moving the regex matching in core.py
Making it a static string in the code, non configurable, with a value of r'\d+'
Deleting all _job_id_from_submit_output from JobQueueCluster subclasses

willirath · 2018-08-23T08:10:53Z

I'll do that.

willirath · 2018-08-23T08:17:30Z

Making it a static string in the code, non configurable, with a value of r'\d+'

Just to clarify: We expect this regex to be working for all cluster? This makes it much simpler, then.

guillaumeeb · 2018-08-23T09:26:36Z

Just to clarify: We expect this regex to be working for all cluster? This makes it much simpler, then.

Yes, that's the idea, and it should work.

guillaumeeb

Really nice, thanks for that, just a few comments.

guillaumeeb · 2018-08-23T09:29:57Z

dask_jobqueue/tests/test_jobqueue_core.py

+                         ['Request {jobid}.asdf was sumbitted to queue 12.',
+                          '{jobid}',
+                          '  <{jobid}>  ',
+                          '{jobid}; asdf'])


Nice!

Might I ask you to use complete output as examples provided in #132 (comment)?

Yes, I'll do this.

guillaumeeb · 2018-08-23T09:32:19Z

dask_jobqueue/core.py

@@ -294,7 +295,7 @@ def start_workers(self, n=1):
        for _ in range(num_jobs):
            with self.job_file() as fn:
                out = self._submit_job(fn)
-                job = self._job_id_from_submit_output(out.decode())
+                job = _job_id_from_submit_output(out.decode())


Are we sure we don't want to leave it as a class method?

Just asking, I don't really have an opinion yet, and I am not a really experienced python user.

If one day we want to be able to add the regex to config file, then won't it be a problem?

I also don't have any strong opinion on this. Just went for the minimal amount of code necessary to implement (and test) this.

HTCondor (taken somewhere on internet):

Submitting job(s)con. Logging submit event(s). 1 job(s) submitted to cluster 6075. # but there is the -terse option that should make it like 10744.0 - 10744.2

Looks like a strong case for leaving the parser a static method of the class. (This way, it would be easy to create an HTCondorCluster with a different parser.

I'll change this.

willirath · 2018-08-23T14:44:08Z

I think I have addressed everything. Should I rebase this into a nicer PR?

guillaumeeb · 2018-08-23T15:19:59Z

If you mean git rebase, we will squash commit when merging anyway.
@lesteve could you take a look at this one?

guillaumeeb · 2018-08-24T10:58:46Z

dask_jobqueue/core.py

-    def _job_id_from_submit_output(self, out):
-        raise NotImplementedError('_job_id_from_submit_output must be implemented when JobQueueCluster is '
-                                  'inherited. It should convert the stdout from submit_command to the job id')
+    @staticmethod


Is this annotation really required?

I'm not sure how it will be handled if a subclass needs to override it?

As we don't need any implicit args

Overriding works just fine (You can also override with a non-static method):

# base class class AClass: @staticmethod def parse_string(string): return string.split(".") AClass.parse_string("one.two") # --> ["one", "two"] # identical copy, no override class BClass(AClass): pass BClass.parse_string("one.two") # --> ["one", "two"] # oeverride as static method class CClass(AClass): @staticmethod def parse_string(string): return string.split(";") CClass.parse_string("one;two") # --> ["one", "two"] AClass.parse_string("one;two") # --> ["one;two"] # override as method class DClass(AClass): separator = "-" def parse_string(self, string): return string.split(self.separator) dc = DClass() # need to instantiate before calling dc.parse_string("one-two") # --> ["one", "two"]

guillaumeeb · 2018-08-24T11:44:13Z

docs/index.rst

@@ -285,5 +285,4 @@ problems are the following:
  We use submit command stdout to parse the job_id corresponding to the
  launched group of worker. If the parsing fails, then dask-jobqueue won't work
  as expected and may throw exceptions. You can have a look at the parsing
-  function in every ``JobQueueCluster`` implementation, see
-  ``_job_id_from_submit_output`` function.
+  function in the ``JobQueueCluster._job_id_from_submit_output`` function.


Duplicated function here, maybe we should formulate differently.

lesteve

A few comments. I have started working working on a OARCluster implementation and I realised that the \d+ regexp will not work for OAR. This is the reason of most of my comments below.

lesteve · 2018-08-24T12:01:54Z

dask_jobqueue/core.py

-                                  'inherited. It should convert the stdout from submit_command to the job id')
+    @staticmethod
+    def _job_id_from_submit_output(out):
+        return re.findall(r'\d+', out)[0]


A few comments about this:

I am in favour of using a class variable for the regexp pattern. This way the regexp pattern can be tweaked more easily in derived classes. It can also be tweaked by the user in the case where the cluster config is a bit special if needed:

cluster = MyCluster(...) cluster.job_id_regexp = my_special_regexp

I think the best way is to have a regexp with a named match: r(?<job_id>:\d+) by default. This way you can use the name of the match which is more robust:

match = re.search(self.job_id_regexp, out) job_id = match.groupdict.get('job_id')

There should be better error treatment

if job_id is None: raise ValueError('Could not parse job id from submission command output.' ' Job id regexp is {}, submission command output is: {}'.format( self.job_id_regexp, out)

lesteve · 2018-08-24T12:02:18Z

dask_jobqueue/tests/test_jobqueue_core.py

+
+@pytest.mark.parametrize(
+    'qsub_return_string',
+    ['{jobid}.admin01',


Nice test, could you use job_id for consistency reasons?

willirath · 2018-08-24T16:09:44Z

f72a7f1 should address all your points @lesteve. Let me know what you think.

willirath · 2018-08-24T16:12:43Z

I went for only having r'\d+' in the class var and then constructing the named group regexp in the function. Should make it easier for a user to override at runtime.

lesteve · 2018-08-24T16:30:05Z

dask_jobqueue/core.py

@@ -138,6 +139,7 @@ class JobQueueCluster(Cluster):
    cancel_command = None
    scheduler_name = ''
    _adaptive_options = {'worker_key': lambda ws: _job_id_from_worker_name(ws.name)}
+    job_id_regexp = r'\d+'


Put the full regexp here, i.e. (?P<job_id>\d+). This way this is fully customisable in derived class. Currently you are assuming that the regexp starts with (?<job_id> in _job_id_from_submit_output which may not be appropriate.

lesteve · 2018-08-24T16:39:58Z

dask_jobqueue/core.py

-        raise NotImplementedError('_job_id_from_submit_output must be implemented when JobQueueCluster is '
-                                  'inherited. It should convert the stdout from submit_command to the job id')
+        regexp = r'(?P<job_id>{})'.format(self.job_id_regexp)
+        match = re.search(regexp, out)


I think to cover all the cases that can error with a not very helpful error message, you need to do something like this:

msg = 'Could not parse ...' if match is None: raise ValueError(msg) job_id = match.groupdict().get('job_id') if job_id is None: raise ValueError(msg)

It would be great to have a test for error messages: you can use something like:

with pytest.raises(ValueError, match=error_message_regexp): ...

The match keyword gets only a part of the expected error message. Not sure if this is enough. Looks similar to similar tests in dask-ml, though.

lesteve · 2018-08-24T16:40:41Z

dask_jobqueue/core.py

+        match = re.search(self.job_id_regexp, out)
+        job_id = match.group('job_id')
+        if job_id is None:
+            raise ValueError('Could not parse job id from submission command'


Add a space at the end of each line otherwise the text will be something like "submission commandoutput"

willirath · 2018-08-24T17:46:21Z

@lesteve What do you think?

lesteve · 2018-08-27T09:00:03Z

dask_jobqueue/core.py

+        if match is None:
+            raise ValueError(msg)
+
+        job_id = match.group('job_id')


Can you use job_id = match.groupdict().get('job_id') to have a better error as I mentioned (probably folded away in an outdated diff).

Thinking about this it would be great to have a better error in the case when there is a match but no job_id named group, i.e. something like:

"You need to use a `job_id` named group in your regexp, e.g. '(?P<job_id>\d+)'. Your regexp was: {}".format(self.job_id_regexp)

lesteve · 2018-08-27T09:04:35Z

dask_jobqueue/tests/test_jobqueue_core.py

+     '{job_id};cluster',
+     'Job <{job_id}> is submitted to default queue <normal>.',
+     '{job_id}',
+     pytest.param('{job_id}.admin01', marks=pytest.mark.xfail)])


I forgot you could do something like this with parametrize, nice! Can you explain why it is xfail though?

In this case {job_id} expands to XXXXX (see https://github.com/dask/dask-jobqueue/pull/131/files/9c83c296c33421c2ec496580b67a81feaa58f6b1#diff-6715b450c58d2e22119574769a68a1d6R92). Then, the 01 in admin01 would be interpreted as the job_id.

lesteve · 2018-08-27T09:13:20Z

dask_jobqueue/tests/test_jobqueue_core.py

+
+@pytest.mark.parametrize('Cluster', [PBSCluster, MoabCluster, SLURMCluster,
+                                     SGECluster, LSFCluster])
+@pytest.mark.parametrize(


General remark, for the error cases I would remove this parametrize and just test with something like this:

an output without match: e.g. 'there is no number here'

a regexp that match but that does not have a job_id named group, e.g. cluster.job_id_regexp = r'\d+' and an output 'Job <12345> is submitted to default queue <normal>.

willirath · 2018-08-27T09:47:53Z

@lesteve I think, I've addressed everything, now.

(I won't probably get back to this before Wednesday. So feel free to push stuff to this branch. :) )

lesteve · 2018-08-28T20:29:56Z

I pushed some small tweaks, merging, thanks a lot @willirath!

jhamman · 2018-08-28T20:52:51Z

I just want to say, I am a fan of the general approach here. I'm also glad to see some tests for parsing these job ids. Thanks to all of you for ticking this one off.

willirath and others added 2 commits August 22, 2018 11:23

Use regex for job-id lookup

f0566c8

This also captures more general output like "Request 123456.asdfqwer submitted to queue: standard."

Make jobid regex configurable

9fb1ca0

guillaumeeb approved these changes Aug 22, 2018

View reviewed changes

guillaumeeb mentioned this pull request Aug 22, 2018

Generalise job_id retrieval by regex matching #132

Closed

willirath mentioned this pull request Aug 22, 2018

worker names have --0:jobid-- instead of --jobid-- #134

Closed

willirath added 6 commits August 23, 2018 10:36

Centralize jobid lookup and use same regex for all

06a9f02

Fix string formatter

aab9a6e

Promote jobid parsing to function

d67e68e

Adapt documentation to new location of jobid parsing

c1604c8

Fix import for parsing function

b11e2c9

Promote jobid parsing to function

c50fe05

guillaumeeb reviewed Aug 23, 2018

View reviewed changes

willirath added 2 commits August 23, 2018 14:47

Test jobid lookup against more example outputs

e904f0a

Make parser a static method

4dcb2e1

guillaumeeb reviewed Aug 24, 2018

View reviewed changes

guillaumeeb mentioned this pull request Aug 24, 2018

Release ? #127

Closed

guillaumeeb reviewed Aug 24, 2018

View reviewed changes

lesteve reviewed Aug 24, 2018

View reviewed changes

Re-word trouble shooting of job_id parser

2602192

lesteve mentioned this pull request Aug 24, 2018

OARCluster implementation #137

Merged

Use method with regexp in class var

f72a7f1

Fix typo in test

6fda4c9

willirath added 2 commits August 24, 2018 18:24

Skip Cluster base class and properly init clusters

3f61dd9

Fix parens

9acf92d

lesteve reviewed Aug 24, 2018

View reviewed changes

Use full named group in regexp at class level

f6240b7

lesteve reviewed Aug 24, 2018

View reviewed changes

willirath added 4 commits August 24, 2018 18:55

Handle parsing errors

f39f116

Fix regexp to really include pattern

4fc5318

Test for error handling

dcbe350

Don't assign job_id in test

9c83c29

lesteve reviewed Aug 27, 2018

View reviewed changes

willirath added 3 commits August 27, 2018 11:32

Use more spectific error messages

5463a1d

Simplify test for error message

23e54fc

Fix error message test

d5a9e57

lesteve added 2 commits August 28, 2018 18:42

Small tweaks

a71c293

Use longer match for clarity

cb13919

lesteve merged commit a8d37e6 into dask:master Aug 28, 2018

guillaumeeb mentioned this pull request Jan 30, 2019

SLURM cluster fails with unrecognized option '--parsable' #227

Closed

WIP: Use regex for job-id lookup #131

WIP: Use regex for job-id lookup #131

Conversation

willirath commented Aug 22, 2018 • edited Loading

guillaumeeb left a comment

Choose a reason for hiding this comment

willirath commented Aug 22, 2018

willirath commented Aug 22, 2018

lesteve commented Aug 22, 2018

guillaumeeb commented Aug 23, 2018

willirath commented Aug 23, 2018

willirath commented Aug 23, 2018

guillaumeeb commented Aug 23, 2018

guillaumeeb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willirath commented Aug 23, 2018

guillaumeeb commented Aug 23, 2018

Choose a reason for hiding this comment

willirath Aug 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lesteve left a comment • edited Loading

Choose a reason for hiding this comment

lesteve Aug 24, 2018 • edited Loading

Choose a reason for hiding this comment

lesteve Aug 24, 2018 • edited Loading

Choose a reason for hiding this comment

willirath commented Aug 24, 2018

willirath commented Aug 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willirath commented Aug 24, 2018

lesteve Aug 27, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willirath commented Aug 27, 2018

lesteve commented Aug 28, 2018

jhamman commented Aug 28, 2018

willirath commented Aug 22, 2018 •

edited

Loading

willirath Aug 24, 2018 •

edited

Loading

lesteve left a comment •

edited

Loading

lesteve Aug 24, 2018 •

edited

Loading

lesteve Aug 24, 2018 •

edited

Loading

lesteve Aug 27, 2018 •

edited

Loading