Improve submit files #81

alongd · 2019-03-06T03:20:51Z

Use dict of dicts to store submit templates in server/software hierarchy
Added a maximal job time attribute to ARC

codecov · 2019-03-06T14:44:13Z

Codecov Report

Merging #81 into master will decrease coverage by 34.89%.
The diff coverage is 6.66%.

@@           Coverage Diff            @@
##           master    #81      +/-   ##
========================================
- Coverage    41.1%   6.2%   -34.9%     
========================================
  Files          22     22              
  Lines        4885   4898      +13     
  Branches     1263   1266       +3     
========================================
- Hits         2008    304    -1704     
- Misses       2556   4590    +2034     
+ Partials      321      4     -317

Impacted Files	Coverage Δ
arc/job/submit.py	`0% <ø> (-100%)`	⬇️
arc/main.py	`2.75% <0%> (-40.26%)`	⬇️
arc/scheduler.py	`2.05% <0%> (-16.04%)`	⬇️
arc/job/job.py	`1.19% <9.09%> (-20.76%)`	⬇️
arc/job/inputs.py	`0% <0%> (-100%)`	⬇️
arc/__init__.py	`20% <0%> (-80%)`	⬇️
arc/job/__init__.py	`25% <0%> (-75%)`	⬇️
arc/species/converter.py	`15.92% <0%> (-64.97%)`	⬇️
arc/rmgdb.py	`10.27% <0%> (-63.7%)`	⬇️
arc/parser.py	`13.58% <0%> (-62.97%)`	⬇️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2c00b1d...5cc109a. Read the comment docs.

codecov · 2019-03-06T14:44:14Z

Codecov Report

Merging #81 into master will increase coverage by 0.01%.
The diff coverage is 50%.

@@           Coverage Diff            @@
##           master    #81      +/-   ##
========================================
+ Coverage   40.99%    41%   +0.01%     
========================================
  Files          22     22              
  Lines        4923   4941      +18     
  Branches     1274   1277       +3     
========================================
+ Hits         2018   2026       +8     
- Misses       2578   2586       +8     
- Partials      327    329       +2

Impacted Files	Coverage Δ
arc/job/submit.py	`100% <ø> (ø)`	⬆️
arc/settings.py	`100% <100%> (ø)`	⬆️
arc/main.py	`43.27% <100%> (+0.26%)`	⬆️
arc/job/job.py	`21.49% <18.18%> (-0.19%)`	⬇️
arc/scheduler.py	`18.68% <71.42%> (+0.2%)`	⬆️
arc/reaction.py	`42.35% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 51c1aca...052e3e2. Read the comment docs.

alongd · 2019-03-06T16:08:41Z

@cgrambow and @dranasinghe, can you take a look at these updated submit scripts?

alongd · 2019-03-06T16:09:59Z

Here's a clean version:

submit_scripts = {
    'Slurm': {
        # Gaussian09 on C3DDB
        'gaussian': """#!/bin/bash -l
#SBATCH -p defq
#SBATCH -J {name}
#SBATCH -N 1
#SBATCH -n 8
#SBATCH --time={t_max}
#SBATCH --mem-per-cpu 4500

module add c3ddb/gaussian/09.d01
which g09

echo "============================================================"
echo "Job ID : $SLURM_JOB_ID"
echo "Job Name : $SLURM_JOB_NAME"
echo "Starting on : $(date)"
echo "Running on node : $SLURMD_NODENAME"
echo "Current directory : $(pwd)"
echo "============================================================"

WorkDir=/scratch/users/{un}/$SLURM_JOB_NAME-$SLURM_JOB_ID
SubmitDir=`pwd`

GAUSS_SCRDIR=/scratch/users/{un}/g09/$SLURM_JOB_NAME-$SLURM_JOB_ID
export  GAUSS_SCRDIR

mkdir -p $GAUSS_SCRDIR
mkdir -p $WorkDir

cd  $WorkDir
. $g09root/g09/bsd/g09.profile

cp $SubmitDir/input.gjf .

g09 < input.gjf > input.log
formchk  check.chk check.fchk
cp * $SubmitDir/

rm -rf $GAUSS_SCRDIR
rm -rf $WorkDir

""",

        # Orca on C3DDB:
        'orca': """#!/bin/bash -l
#SBATCH -p defq
#SBATCH -J {name}
#SBATCH -N 1
#SBATCH -n 8
#SBATCH --time={t_max}
#SBATCH --mem-per-cpu 4500

module add c3ddb/orca/4.0.0
module add c3ddb/openmpi/2.0.2
which orca

export ORCA_DIR=/cm/shared/c3ddb/orca/4.0.0/
export PATH=$PATH:$ORCA_DIR

echo "============================================================"
echo "Job ID : $SLURM_JOB_ID"
echo "Job Name : $SLURM_JOB_NAME"
echo "Starting on : $(date)"
echo "Running on node : $SLURMD_NODENAME"
echo "Current directory : $(pwd)"
echo "============================================================"


WorkDir=/scratch/users/{un}/$SLURM_JOB_NAME-$SLURM_JOB_ID
SubmitDir=`pwd`

mkdir -p $WorkDir
cd  $WorkDir

cp $SubmitDir/input.inp .

${ORCA_DIR}/orca input.inp > input.log
cp * $SubmitDir/

rm -rf $WorkDir

""",

        # Molpro 2015 on RMG
        'molpro': """#!/bin/bash -l
#SBATCH -p normal
#SBATCH -J {name}
#SBATCH -N 1
#SBATCH -c 8
#SBATCH --time={t_max}
#SBATCH --mem-per-cpu=2048

export PATH=/opt/molpro/molprop_2015_1_linux_x86_64_i8/bin:$PATH

echo "============================================================"
echo "Job ID : $SLURM_JOB_ID"
echo "Job Name : $SLURM_JOB_NAME"
echo "Starting on : $(date)"
echo "Running on node : $SLURMD_NODENAME"
echo "Current directory : $(pwd)"
echo "============================================================"

# WorkDir=`pwd`
# cd
# source .bashrc
sdir=/scratch/{un}/$SLURM_JOB_NAME-$SLURM_JOB_ID
mkdir -p $sdir
# export TMPDIR=$sdir
# cd $WorkDir

molpro -d $sdir input.in

rm -rf $sdir

""",
    },


    'OGE': {
        # Gaussian16 on Pharos
        'gaussian': """#!/bin/bash -l

#$ -N {name}
#$ -l long
#$ -l h_rt={t_max}
#$ -l harpertown
#$ -m ae
#$ -pe singlenode 6
#$ -l h=!node60.cluster
#$ -cwd
#$ -o out.txt
#$ -e err.txt

echo "Running on node:"
hostname

g16root=/opt
GAUSS_SCRDIR=/scratch/{un}/{name}
export g16root GAUSS_SCRDIR
. $g16root/g16/bsd/g16.profile
mkdir -p /scratch/{un}/{name}

g16 input.gjf

rm -r /scratch/{un}/{name}

""",
        # Gaussian03 on Pharos
        'gaussian03_pharos': """#!/bin/bash -l

#$ -N {name}
#$ -l long
#$ -l h_rt={t_max}
#$ -l harpertown
#$ -m ae
#$ -pe singlenode 6
#$ -l h=!node60.cluster
#$ -cwd
#$ -o out.txt
#$ -e err.txt

echo "Running on node:"
hostname

g03root=/opt
GAUSS_SCRDIR=/scratch/{un}/{name}
export g03root GAUSS_SCRDIR
. $g03root/g03/bsd/g03.profile
mkdir -p /scratch/{un}/{name}

g16 input.gjf

rm -r /scratch/{un}/{name}

""",
        # QChem 4.4 on Pharos:
        'qchem': """#!/bin/bash -l

#$ -N {name}
#$ -l long
#$ -l h_rt={t_max}
#$ -l harpertown
#$ -m ae
#$ -pe singlenode 6
#$ -l h=!node60.cluster
#$ -cwd
#$ -o out.txt
#$ -e err.txt

echo "Running on node:"
hostname

export QC=/opt/qchem
export QCSCRATCH=/scratch/{un}/{name}
export QCLOCALSCR=/scratch/{un}/{name}/qlscratch
. $QC/qcenv.sh

mkdir -p /scratch/{un}/{name}/qlscratch

qchem -nt 6 input.in output.out

rm -r /scratch/{un}/{name}

""",
        # Molpro 2012 on Pharos
        'molpro': """#! /bin/bash -l

#$ -N {name}
#$ -l long
#$ -l h_rt={t_max}
#$ -l harpertown
#$ -m ae
#$ -pe singlenode 6
#$ -l h=!node60.cluster
#$ -cwd
#$ -o out.txt
#$ -e err.txt

export PATH=/opt/molpro2012/molprop_2012_1_Linux_x86_64_i8/bin:$PATH

sdir=/scratch/{un}
mkdir -p /scratch/{un}/qlscratch

molpro -d $sdir -n 6 input.in
""",
    }
}

cgrambow

I made some suggestions for the non-C3DDB servers.

More fundamentally though, what is your plan going forward? If ARC gets to the stage where people outside of the group start using it then these scripts are not going to work for them and it's not feasible to maintain scripts for all eventualities. One way to tackle that would be to have minimal script templates that users can modify themselves and require that the users themselves make sure that Gaussian, Molpro, etc. are available at the command line. Just wanted to know what your thoughts are or if it's too early to start thinking about these questions.

cgrambow · 2019-03-06T18:55:40Z

arc/job/submit.py

+echo "Current directory : $(pwd)"
+echo "============================================================"
+
+# WorkDir=`pwd`


You can probably remove these comments. Same goes for all the following lines.

cgrambow · 2019-03-06T19:00:13Z

arc/job/submit.py

+#SBATCH -p normal
+#SBATCH -J {name}
+#SBATCH -N 1
+#SBATCH -c 8


You're hardcoding the number of CPUs requested per task here to be requested by SLURM, but then you don't actually end up running Molpro in parallel. Also, this option should maybe be set by the user instead of defaulting to 8. I think a default of 1 makes the most sense.

Also, I think Molpro might launch different tasks when running in parallel, so you might need the -n option instead of -c.

So this should be #SBATCH -n 1?

Yeah, I think I'd go with -n 1 (or -n 8 if you want to parallelize eight ways, for example).

cgrambow · 2019-03-06T19:00:49Z

arc/job/submit.py

+#SBATCH -N 1
+#SBATCH -c 8
+#SBATCH --time={t_max}
+#SBATCH --mem-per-cpu=2048


This will depend a lot on the size of the molecule. Just curious if you're planning to make this adjustable later on.

I'm passing the overall memory (MW) to the molpro input file. If we're using just one CPU, I could/should make this parameter the same. I guess this it's in MB here, right?

Yeah, this should be the same amount in MB (maybe slightly larger to have a small safety factor) as you're allowing Molpro to use in MW.

cgrambow · 2019-03-06T19:02:16Z

arc/job/submit.py

+# export TMPDIR=$sdir
+# cd $WorkDir
+
+molpro -d $sdir input.in


Here, you're not running Molpro in parallel. If you wanted to do that you have to add the -n option.

Let's run in parallel then. So is #SBATCH -n 8 enough, or should I give a directive in this line as well?

To run Molpro in parallel you will need both #SBATCH -n 8 and molpro -n 8 -d ....

cgrambow · 2019-03-06T19:03:59Z

arc/job/submit.py

+
+ # Molpro 2015 on RMG
+ 'molpro': """#!/bin/bash -l
+#SBATCH -p normal


Maybe consider defaulting to long?

cgrambow · 2019-03-06T19:06:03Z

arc/job/submit.py

+#$ -N {name}
+#$ -l long
+#$ -l h_rt={t_max}
+#$ -l harpertown


G16 also runs on the magnycours nodes, so this isn't needed anymore.

cgrambow · 2019-03-06T19:06:31Z

arc/job/submit.py

+#$ -l h_rt={t_max}
+#$ -l harpertown
+#$ -m ae
+#$ -pe singlenode 6


Pharos fails much more often when I request 8

Also very interesting :p

cgrambow · 2019-03-06T19:07:17Z

arc/job/submit.py

+rm -r /scratch/{un}/{name}
+
+""",
+ # Gaussian03 on Pharos


Same comments as above.

cgrambow · 2019-03-06T19:08:31Z

arc/job/submit.py


 g16 input.gjf

 rm -r /scratch/{un}/{name}

 """,
- 'qchem': """#!/bin/bash
-
+ # QChem 4.4 on Pharos:


Same comments again.

cgrambow · 2019-03-06T19:08:47Z

arc/job/submit.py

-echo "Running on node : $SLURMD_NODENAME"
-echo "Current directory : $(pwd)"
-echo "============================================================"
+ # Molpro 2012 on Pharos


And here as well.

alongd · 2019-03-07T01:57:51Z

Thanks @cgrambow! I added some questions. Particularly, could you help me correctly run molpro in parallel?

Added a maximum job time argument to ARC and passing it to the submit scripts

alongd · 2019-03-07T20:01:46Z

Thanks @cgrambow! Could you take a final look whether Molpro on the RMG server is now parallelized correctly on 8 cpu's?

cgrambow

I ran a quick test job on RMG to test that it works and everything seems to run fine with the correct number of processors.

alongd · 2019-03-07T20:40:52Z

Thanks!

alongd · 2019-03-08T15:44:20Z

@cgrambow, I addressed the technicalities, but not your broader question.
My current view is that all users should make sure the submit.py fits their needs (manually). I think we say it in the documentation. Merging this PR makes it extremely convenient for members of our group, and gives a nice example for other users. Perhaps we could think of a clever way to tailor these scripts to an arbitrary server (or at least try).

alongd added Module: Job Type: Feature labels Mar 6, 2019

alongd changed the title ~~Imprve submit files~~ Improve submit files Mar 6, 2019

alongd force-pushed the submit branch 3 times, most recently from 83eb5fa to 5cc109a Compare March 6, 2019 14:44

alongd force-pushed the submit branch 2 times, most recently from f88fa04 to 4c756e9 Compare March 6, 2019 15:54

alongd force-pushed the submit branch from 4c756e9 to f40f092 Compare March 6, 2019 16:19

cgrambow reviewed Mar 6, 2019

View reviewed changes

ReactionMechanismGenerator deleted a comment Mar 7, 2019

alongd added 2 commits March 7, 2019 15:00

Converted submit templates into a dict of dicts

4dab215

Added a maximum job time argument to ARC and passing it to the submit scripts

BugFix: pass if a conformer geometry cannot be read

052e3e2

alongd force-pushed the submit branch from 874358b to 052e3e2 Compare March 7, 2019 20:00

ReactionMechanismGenerator deleted a comment Mar 7, 2019

cgrambow approved these changes Mar 7, 2019

View reviewed changes

alongd merged commit d6a8e12 into master Mar 7, 2019

alongd deleted the submit branch March 7, 2019 20:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve submit files #81

Improve submit files #81

alongd commented Mar 6, 2019

codecov bot commented Mar 6, 2019

codecov bot commented Mar 6, 2019 •

edited

Loading

alongd commented Mar 6, 2019

alongd commented Mar 6, 2019

cgrambow left a comment

cgrambow Mar 6, 2019

cgrambow Mar 6, 2019

alongd Mar 7, 2019

cgrambow Mar 7, 2019

cgrambow Mar 6, 2019

alongd Mar 7, 2019

cgrambow Mar 7, 2019

cgrambow Mar 6, 2019

alongd Mar 7, 2019

cgrambow Mar 7, 2019 •

edited

Loading

cgrambow Mar 6, 2019

cgrambow Mar 6, 2019

cgrambow Mar 6, 2019

alongd Mar 7, 2019

cgrambow Mar 7, 2019

cgrambow Mar 6, 2019

cgrambow Mar 6, 2019

cgrambow Mar 6, 2019

alongd commented Mar 7, 2019

alongd commented Mar 7, 2019

cgrambow left a comment

alongd commented Mar 7, 2019

alongd commented Mar 8, 2019

Improve submit files #81

Improve submit files #81

Conversation

alongd commented Mar 6, 2019

codecov bot commented Mar 6, 2019

Codecov Report

codecov bot commented Mar 6, 2019 • edited Loading

Codecov Report

alongd commented Mar 6, 2019

alongd commented Mar 6, 2019

cgrambow left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgrambow Mar 7, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alongd commented Mar 7, 2019

alongd commented Mar 7, 2019

cgrambow left a comment

Choose a reason for hiding this comment

alongd commented Mar 7, 2019

alongd commented Mar 8, 2019

codecov bot commented Mar 6, 2019 •

edited

Loading

cgrambow Mar 7, 2019 •

edited

Loading