Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad error handling when failure #17

Open
wookayin opened this issue Apr 6, 2023 · 2 comments
Open

Bad error handling when failure #17

wookayin opened this issue Apr 6, 2023 · 2 comments

Comments

@wookayin
Copy link

wookayin commented Apr 6, 2023

>>> slurm.sbatch('echo demo.py ' + Slurm.SLURM_ARRAY_TASK_ID)

File ".../site-packages/simple_slurm/core.py", line 131, in sbatch
    assert success_msg in stdout, result.stderr
AssertionError: None

It does not show what's exactly wrong and what happened from the sbatch command's output.

E.g.,

sbatch: error: Invalid generic resource (gres) specification
@wookayin
Copy link
Author

wookayin commented Apr 6, 2023

Also, when --verbose is given, it should print the stdout output before dying due to AssertionError.

@amq92
Copy link
Owner

amq92 commented Apr 9, 2023

Currently the subprocess.run only redirects the stdout but not the stderr so any error message should have been printed even before the assert is reached ! It this not the case for you ? Could you post the entire Traceback ?

This explains why the assertion message is None, as it was never captured by the run command.
A simple fix would be result = subprocess.run(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

If indeed you have not seen any stderr message, your case is most interesting.
Could you please share a MWE of your script ? Also, could you execute it using the "normal" procedure (i.e. without simple_slurm with your own shell script) to see what the output should be ?

I'll hold off on making the update till after reviewing your case, as it may be more complicated than a simple redirection.

Here's an example with a wrong gres specification :

from simple_slurm import Slurm

slurm = Slurm(array=range(-1), verbose=True)
slurm.sbatch('echo $HOSTNAME')

Using python

$ python demo.py
sbatch: error: Invalid generic resource (gres) specification   #  < ---- error shown here
Traceback (most recent call last):
  File "demo.py", line 4, in <module>
    slurm.sbatch('echo $HOSTNAME')
  File "[...]/simple_slurm/core.py", line 131, in sbatch
    assert success_msg in stdout, result.stderr
AssertionError: None    # <------- not very clear message ! :S

Using interactive python

$ ipython
In [1]: %run demo.py
sbatch: error: Invalid generic resource (gres) specification
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
File /gpfs_new/data/users/amendoza/home/Documents/simple_slurm/demo.py:4
      1 from simple_slurm import Slurm
      3 slurm = Slurm(gres='xxx')
----> 4 slurm.sbatch('echo $HOSTNAME')

File /gpfs_new/data/users/amendoza/home/Documents/simple_slurm/simple_slurm/core.py:131, in Slurm.sbatch(self, run_cmd, convert, verbose, sbatch_cmd, shell)
    129 success_msg = 'Submitted batch job'
    130 stdout = result.stdout.decode('utf-8')
--> 131 assert success_msg in stdout, result.stderr
    132 if verbose:
    133     print(stdout)

AssertionError: None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants