Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

case.submit does not display system error messages #1396

Closed
worleyph opened this issue Apr 11, 2017 · 12 comments · Fixed by #1490
Closed

case.submit does not display system error messages #1396

worleyph opened this issue Apr 11, 2017 · 12 comments · Fixed by #1490

Comments

@worleyph
Copy link
Contributor

Example, if two jobs are submitted to the debug queue on Titan, the second submittsion will fail. An informative error message used to be output (CIME 5.1 and earlier), but this is now missing:

 Submitting job script qsub    -q debug -l walltime=00:15:00 -A cli115 case.run 
 ERROR: Command: 'qsub    -q debug -l walltime=00:15:00 -A cli115 case.run ' failed with error ''

This is not the only useful error mesage that is now missing, but I don't have any other examples at the moment.

I vaguely remember already describing this problem somewhere, but can not find an existing open issue.

@jgfouca
Copy link
Member

jgfouca commented Apr 11, 2017

@worleyph , you are correct and @rljacob already noted this on the ESMCI side and it's already been fixed there.

@worleyph
Copy link
Contributor Author

Thanks - that is probably what I remember - Rob volunteering to create this issue on the ESMCI side, based on a Slack conversation.

@worleyph
Copy link
Contributor Author

@rljacob and @jgfouca , is the fix from the ESMCI side in CIME5.3 (i.e., in next)?

jgfouca added a commit that referenced this issue Apr 19, 2017
Update CIME to cime5.3

Update ACME to use CIME 5.3.x up to ESMCI hash 6156e0a from Thursday 4/13.

scripts_regression_tests and acme_developer tests pass on penn workstation.

This CIME change involves a large cosmetic change in CIME. Several directories have been renamed.

    The main interface to CIME, the scripts dir remains unchanged.
    cime/cime_config is renamed to cime/config
    The Fortran source code has been consolidated in src directory: externals, components, share, and drivers
    The driver_cpl directory has been moved to src/drivers/mct to prepare for multiple coupler options
    The share/csm_share/share directory has been renamed src/share/util.
    The cime script infrastructure source, utils/python/CIME, has moved to scripts/lib/CIME.

@singhbalwinder : please make sure PNNL cluster merge look correct
@mfdeakin-sandia : please look at cprnc/CMakeLists.txt and also confirm no run_acme changes.

Fixes #1311
Fixes #1380
Fixes #1382
Fixes #1383
Fixes #1396
Fixes #1402
Fixes #1416

[BFB]

* agsalin/update-to-cime5.3: (5649 commits)
  Fix titan problem
  Reset invalid pio_numiotasks
  Fix merge mistake
  Re-add eca testmod, was somehow lost in big merge
  Restore ACME version of shr_orb_cosz
  Fixed whitespace errors in Makefile killing COSP
  HOMME test typo fix
  Fixes to get tests to pass.
  Update ACME for new CIME directory structure
  More renaming.
  Rename ACME/cime directories in prep for CIME merge
  HOMME Improvement
  More edits to README
  Suggest a more portable usage of mktemp
  Tweak a comment
  Add comment on acme side
  Fix case for new unit_testing attribute to get_mpirun
  Allow run_tests.py to auto-determine the machine name
  fix an error
  make user-compset a seperate test
  ...
@jgfouca
Copy link
Member

jgfouca commented Apr 19, 2017

@worleyph yes, and it's on master now too.

@worleyph
Copy link
Contributor Author

worleyph commented Apr 19, 2017 via email

@worleyph worleyph reopened this Apr 19, 2017
@worleyph
Copy link
Contributor Author

@jgfouca , with current master, with one job already submitted to the debug queue on Titan, and another job submitted to the debug queue has the following error message:

 ./case.submit 
 ...
 Submitting job script qsub    -q debug -l walltime=00:45:00 -A cli115 case.run 
 ERROR: Command: 'qsub    -q debug -l walltime=00:45:00 -A cli115 case.run ' failed with error 'None'

If I instead submit a non-ACME job (using qsub directly), e.g. an interactive job request, I get

 >qsub -I -q debug -A cli115 -l walltime=30:00 nodes=1 (y|n|e|a)? yes
 
  Job not submitted 
  
  You currently have a job in the debug queue. Each 
   user is allowed to have only one job at a time in the       
   debug queue.  Please wait until job 3347570 completes  
   before submitting another job to the debug queue.     

Previously this error message was passed back when ./case.submit errored out.

I see that something cahnged; what was ' ', is now 'None', but not sure that this is a meaningful change :-).

@jgfouca
Copy link
Member

jgfouca commented Apr 20, 2017

@worleyph I'm seeing this too. Don't know what happened as this was working and tested on ESMCI. I'll look at it.

@jgfouca
Copy link
Member

jgfouca commented Apr 20, 2017

@worleyph apparently there was a bad interaction between two PRs on the ESMCI side. The first fixed this issue, then a subsequent PR (also by me) messed with combining output for shell commands and re-broke it. It's now once again fixed on the ESMCI side. It should make it back to ACME fairly soon or I can duplicate the fix (risking conflicts) on the ACME side if it's urgent.

@worleyph
Copy link
Contributor Author

@jgfouca , I labelled this as 'minor', but I am only one person. I can't really make the determination as to whether this can be held back until the next ESMCI sync-up. I don't know how often failures occur, and what different failure modes there are (and how important feedback is in resolving these).

@philipwjones
Copy link
Contributor

I've encountered this too, but have been able to diagnose failures from other log output, so guess not a showstopper. As long as the next update is relatively soon, can probably wait.

@worleyph
Copy link
Contributor Author

@jgfouca , as I indicated in my e-mail to you and Peter Caldwell, I could use this fix now if you are willing to share. I'll just drop it into my working version.

@jgfouca
Copy link
Member

jgfouca commented Apr 27, 2017

@worleyph , apologies, must have deleted the email from github and I didn't notice your mention.

The fix is this:
ESMCI/cime#1390

agsalin pushed a commit that referenced this issue May 1, 2017
Get unit test build and run working with serial or parallel pFUnit
@rljacob rljacob changed the title case.submit does not display system error messages on Titan case.submit does not display system error messages May 3, 2017
@rljacob rljacob removed the minor label May 3, 2017
jgfouca added a commit that referenced this issue Jun 2, 2017
Update CIME to cime5.3

Update ACME to use CIME 5.3.x up to ESMCI hash 6156e0a from Thursday 4/13.

scripts_regression_tests and acme_developer tests pass on penn workstation.

This CIME change involves a large cosmetic change in CIME. Several directories have been renamed.

    The main interface to CIME, the scripts dir remains unchanged.
    cime/cime_config is renamed to cime/config
    The Fortran source code has been consolidated in src directory: externals, components, share, and drivers
    The driver_cpl directory has been moved to src/drivers/mct to prepare for multiple coupler options
    The share/csm_share/share directory has been renamed src/share/util.
    The cime script infrastructure source, utils/python/CIME, has moved to scripts/lib/CIME.

@singhbalwinder : please make sure PNNL cluster merge look correct
@mfdeakin-sandia : please look at cprnc/CMakeLists.txt and also confirm no run_acme changes.

Fixes #1311
Fixes #1380
Fixes #1382
Fixes #1383
Fixes #1396
Fixes #1402
Fixes #1416

[BFB]

* agsalin/update-to-cime5.3: (5649 commits)
  Fix titan problem
  Reset invalid pio_numiotasks
  Fix merge mistake
  Re-add eca testmod, was somehow lost in big merge
  Restore ACME version of shr_orb_cosz
  Fixed whitespace errors in Makefile killing COSP
  HOMME test typo fix
  Fixes to get tests to pass.
  Update ACME for new CIME directory structure
  More renaming.
  Rename ACME/cime directories in prep for CIME merge
  HOMME Improvement
  More edits to README
  Suggest a more portable usage of mktemp
  Tweak a comment
  Add comment on acme side
  Fix case for new unit_testing attribute to get_mpirun
  Allow run_tests.py to auto-determine the machine name
  fix an error
  make user-compset a seperate test
  ...
jgfouca added a commit that referenced this issue Feb 27, 2018
Update CIME to cime5.3

Update ACME to use CIME 5.3.x up to ESMCI hash 6156e0a from Thursday 4/13.

scripts_regression_tests and acme_developer tests pass on penn workstation.

This CIME change involves a large cosmetic change in CIME. Several directories have been renamed.

    The main interface to CIME, the scripts dir remains unchanged.
    cime/cime_config is renamed to cime/config
    The Fortran source code has been consolidated in src directory: externals, components, share, and drivers
    The driver_cpl directory has been moved to src/drivers/mct to prepare for multiple coupler options
    The share/csm_share/share directory has been renamed src/share/util.
    The cime script infrastructure source, utils/python/CIME, has moved to scripts/lib/CIME.

@singhbalwinder : please make sure PNNL cluster merge look correct
@mfdeakin-sandia : please look at cprnc/CMakeLists.txt and also confirm no run_acme changes.

Fixes #1311
Fixes #1380
Fixes #1382
Fixes #1383
Fixes #1396
Fixes #1402
Fixes #1416

[BFB]

* agsalin/update-to-cime5.3: (5649 commits)
  Fix titan problem
  Reset invalid pio_numiotasks
  Fix merge mistake
  Re-add eca testmod, was somehow lost in big merge
  Restore ACME version of shr_orb_cosz
  Fixed whitespace errors in Makefile killing COSP
  HOMME test typo fix
  Fixes to get tests to pass.
  Update ACME for new CIME directory structure
  More renaming.
  Rename ACME/cime directories in prep for CIME merge
  HOMME Improvement
  More edits to README
  Suggest a more portable usage of mktemp
  Tweak a comment
  Add comment on acme side
  Fix case for new unit_testing attribute to get_mpirun
  Allow run_tests.py to auto-determine the machine name
  fix an error
  make user-compset a seperate test
  ...
jgfouca added a commit that referenced this issue Mar 14, 2018
Update CIME to cime5.3

Update ACME to use CIME 5.3.x up to ESMCI hash 6156e0a from Thursday 4/13.

scripts_regression_tests and acme_developer tests pass on penn workstation.

This CIME change involves a large cosmetic change in CIME. Several directories have been renamed.

    The main interface to CIME, the scripts dir remains unchanged.
    cime/cime_config is renamed to cime/config
    The Fortran source code has been consolidated in src directory: externals, components, share, and drivers
    The driver_cpl directory has been moved to src/drivers/mct to prepare for multiple coupler options
    The share/csm_share/share directory has been renamed src/share/util.
    The cime script infrastructure source, utils/python/CIME, has moved to scripts/lib/CIME.

@singhbalwinder : please make sure PNNL cluster merge look correct
@mfdeakin-sandia : please look at cprnc/CMakeLists.txt and also confirm no run_acme changes.

Fixes #1311
Fixes #1380
Fixes #1382
Fixes #1383
Fixes #1396
Fixes #1402
Fixes #1416

[BFB]

* agsalin/update-to-cime5.3: (5649 commits)
  Fix titan problem
  Reset invalid pio_numiotasks
  Fix merge mistake
  Re-add eca testmod, was somehow lost in big merge
  Restore ACME version of shr_orb_cosz
  Fixed whitespace errors in Makefile killing COSP
  HOMME test typo fix
  Fixes to get tests to pass.
  Update ACME for new CIME directory structure
  More renaming.
  Rename ACME/cime directories in prep for CIME merge
  HOMME Improvement
  More edits to README
  Suggest a more portable usage of mktemp
  Tweak a comment
  Add comment on acme side
  Fix case for new unit_testing attribute to get_mpirun
  Allow run_tests.py to auto-determine the machine name
  fix an error
  make user-compset a seperate test
  ...
rljacob pushed a commit that referenced this issue Apr 12, 2021
Get unit test build and run working with serial or parallel pFUnit
rljacob pushed a commit that referenced this issue Apr 16, 2021
Update CIME to cime5.3

Update ACME to use CIME 5.3.x up to ESMCI hash 6156e0a from Thursday 4/13.

scripts_regression_tests and acme_developer tests pass on penn workstation.

This CIME change involves a large cosmetic change in CIME. Several directories have been renamed.

    The main interface to CIME, the scripts dir remains unchanged.
    cime/cime_config is renamed to cime/config
    The Fortran source code has been consolidated in src directory: externals, components, share, and drivers
    The driver_cpl directory has been moved to src/drivers/mct to prepare for multiple coupler options
    The share/csm_share/share directory has been renamed src/share/util.
    The cime script infrastructure source, utils/python/CIME, has moved to scripts/lib/CIME.

@singhbalwinder : please make sure PNNL cluster merge look correct
@mfdeakin-sandia : please look at cprnc/CMakeLists.txt and also confirm no run_acme changes.

Fixes #1311
Fixes #1380
Fixes #1382
Fixes #1383
Fixes #1396
Fixes #1402
Fixes #1416

[BFB]

* agsalin/update-to-cime5.3: (5649 commits)
  Fix titan problem
  Reset invalid pio_numiotasks
  Fix merge mistake
  Re-add eca testmod, was somehow lost in big merge
  Restore ACME version of shr_orb_cosz
  Fixed whitespace errors in Makefile killing COSP
  HOMME test typo fix
  Fixes to get tests to pass.
  Update ACME for new CIME directory structure
  More renaming.
  Rename ACME/cime directories in prep for CIME merge
  HOMME Improvement
  More edits to README
  Suggest a more portable usage of mktemp
  Tweak a comment
  Add comment on acme side
  Fix case for new unit_testing attribute to get_mpirun
  Allow run_tests.py to auto-determine the machine name
  fix an error
  make user-compset a seperate test
  ...
rljacob pushed a commit that referenced this issue Apr 16, 2021
Update CIME to cime5.3

Update ACME to use CIME 5.3.x up to ESMCI hash 6156e0a from Thursday 4/13.

scripts_regression_tests and acme_developer tests pass on penn workstation.

This CIME change involves a large cosmetic change in CIME. Several directories have been renamed.

    The main interface to CIME, the scripts dir remains unchanged.
    cime/cime_config is renamed to cime/config
    The Fortran source code has been consolidated in src directory: externals, components, share, and drivers
    The driver_cpl directory has been moved to src/drivers/mct to prepare for multiple coupler options
    The share/csm_share/share directory has been renamed src/share/util.
    The cime script infrastructure source, utils/python/CIME, has moved to scripts/lib/CIME.

@singhbalwinder : please make sure PNNL cluster merge look correct
@mfdeakin-sandia : please look at cprnc/CMakeLists.txt and also confirm no run_acme changes.

Fixes #1311
Fixes #1380
Fixes #1382
Fixes #1383
Fixes #1396
Fixes #1402
Fixes #1416

[BFB]

* agsalin/update-to-cime5.3: (5649 commits)
  Fix titan problem
  Reset invalid pio_numiotasks
  Fix merge mistake
  Re-add eca testmod, was somehow lost in big merge
  Restore ACME version of shr_orb_cosz
  Fixed whitespace errors in Makefile killing COSP
  HOMME test typo fix
  Fixes to get tests to pass.
  Update ACME for new CIME directory structure
  More renaming.
  Rename ACME/cime directories in prep for CIME merge
  HOMME Improvement
  More edits to README
  Suggest a more portable usage of mktemp
  Tweak a comment
  Add comment on acme side
  Fix case for new unit_testing attribute to get_mpirun
  Allow run_tests.py to auto-determine the machine name
  fix an error
  make user-compset a seperate test
  ...
rljacob pushed a commit that referenced this issue May 6, 2021
Update CIME to cime5.3

Update ACME to use CIME 5.3.x up to ESMCI hash 9802d59 from Thursday 4/13.

scripts_regression_tests and acme_developer tests pass on penn workstation.

This CIME change involves a large cosmetic change in CIME. Several directories have been renamed.

    The main interface to CIME, the scripts dir remains unchanged.
    cime/cime_config is renamed to cime/config
    The Fortran source code has been consolidated in src directory: externals, components, share, and drivers
    The driver_cpl directory has been moved to src/drivers/mct to prepare for multiple coupler options
    The share/csm_share/share directory has been renamed src/share/util.
    The cime script infrastructure source, utils/python/CIME, has moved to scripts/lib/CIME.

@singhbalwinder : please make sure PNNL cluster merge look correct
@mfdeakin-sandia : please look at cprnc/CMakeLists.txt and also confirm no run_acme changes.

Fixes #1311
Fixes #1380
Fixes #1382
Fixes #1383
Fixes #1396
Fixes #1402
Fixes #1416

[BFB]

* agsalin/update-to-cime5.3: (5649 commits)
  Fix titan problem
  Reset invalid pio_numiotasks
  Fix merge mistake
  Re-add eca testmod, was somehow lost in big merge
  Restore ACME version of shr_orb_cosz
  Fixed whitespace errors in Makefile killing COSP
  HOMME test typo fix
  Fixes to get tests to pass.
  Update ACME for new CIME directory structure
  More renaming.
  Rename ACME/cime directories in prep for CIME merge
  HOMME Improvement
  More edits to README
  Suggest a more portable usage of mktemp
  Tweak a comment
  Add comment on acme side
  Fix case for new unit_testing attribute to get_mpirun
  Allow run_tests.py to auto-determine the machine name
  fix an error
  make user-compset a seperate test
  ...
rljacob pushed a commit that referenced this issue May 6, 2021
Update CIME to cime5.3

Update ACME to use CIME 5.3.x up to ESMCI hash 9802d59 from Thursday 4/13.

scripts_regression_tests and acme_developer tests pass on penn workstation.

This CIME change involves a large cosmetic change in CIME. Several directories have been renamed.

    The main interface to CIME, the scripts dir remains unchanged.
    cime/cime_config is renamed to cime/config
    The Fortran source code has been consolidated in src directory: externals, components, share, and drivers
    The driver_cpl directory has been moved to src/drivers/mct to prepare for multiple coupler options
    The share/csm_share/share directory has been renamed src/share/util.
    The cime script infrastructure source, utils/python/CIME, has moved to scripts/lib/CIME.

@singhbalwinder : please make sure PNNL cluster merge look correct
@mfdeakin-sandia : please look at cprnc/CMakeLists.txt and also confirm no run_acme changes.

Fixes #1311
Fixes #1380
Fixes #1382
Fixes #1383
Fixes #1396
Fixes #1402
Fixes #1416

[BFB]

* agsalin/update-to-cime5.3: (5649 commits)
  Fix titan problem
  Reset invalid pio_numiotasks
  Fix merge mistake
  Re-add eca testmod, was somehow lost in big merge
  Restore ACME version of shr_orb_cosz
  Fixed whitespace errors in Makefile killing COSP
  HOMME test typo fix
  Fixes to get tests to pass.
  Update ACME for new CIME directory structure
  More renaming.
  Rename ACME/cime directories in prep for CIME merge
  HOMME Improvement
  More edits to README
  Suggest a more portable usage of mktemp
  Tweak a comment
  Add comment on acme side
  Fix case for new unit_testing attribute to get_mpirun
  Allow run_tests.py to auto-determine the machine name
  fix an error
  make user-compset a seperate test
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants