Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/logging #86

Merged
merged 6 commits into from
Sep 8, 2017
Merged

Feature/logging #86

merged 6 commits into from
Sep 8, 2017

Conversation

PanSzczerba
Copy link
Contributor

I changed the submit templates and tested it on system with Slurm, but unfortunately I don't have access to cluster with Torque, so it only should work in theory.
I haven't figured out how to log start and end times of batch jobs as requested in an corresponding issue (#84) yet, but I'll put my mind into it.

@@ -1,2 +1,6 @@
#!/bin/bash
sbatch {options_args:s} --array=1-{jobs_no:d} --output="{log_dir:s}/output_%j_%a.log" --error="{log_dir:s}/error_%j_%a.log" {script_path:s}

LOGFILE="$(dirname $0)/submit.log"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment where LOGFILE will be located. user may not know what is ""$(dirname $0)
It would be also nice to print on screen sth like:
Saving logs to ....

@codecov-io
Copy link

codecov-io commented Sep 2, 2017

Codecov Report

Merging #86 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #86   +/-   ##
=======================================
  Coverage   64.32%   64.32%           
=======================================
  Files          12       12           
  Lines         855      855           
=======================================
  Hits          550      550           
  Misses        305      305

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4c8bc3c...252526d. Read the comment docs.

@grzanka
Copy link
Collaborator

grzanka commented Sep 2, 2017

Why this PR has so many commits ? They have been correctly merged already:
https://github.com/DataMedSci/mcpartools/pull/85/commits

Please fix all git related issues before proceeding. I see your master branch is not up to date. See
https://github.com/DataMedSci/mcpartools/commits/master
and
https://github.com/PanSzczerba/mcpartools/commits/master

Maybe reading this one https://help.github.com/articles/syncing-a-fork/ will help.
I suggest updating your master branch, then rebasing of your feature branch and push --force.

@antnieszka antnieszka added this to the Next release milestone Sep 2, 2017
@PanSzczerba
Copy link
Contributor Author

Ok, I see only two commits, so I guess it's fixed.

LOGFILE="$(cd $(dirname $0) && pwd)/submit.log"
printf "Job ID: " > "$LOGFILE"

sbatch {options_args:s} --array=1-{jobs_no:d} --output="{log_dir:s}/output_%j_%a.log" --error="{log_dir:s}/error_%j_%a.log" --parsable {script_path:s} | cut -d ";" -f 1 >> "$LOGFILE" && echo "Saving logs to $LOGFILE"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to save also stderr stream (i.e. using 2>?1, see http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-3.html)

I do not like this approach with &&. It won't work if sbatch returns non-zero exit code. See i.e.:

→ echo "Job id:" > 1.log
→ ls aaaa >> 1.log && echo "Saving logs"
ls: cannot access 'aaaa': No such file or directory
→ cat 1.log
Job id:

→ echo "Job id:" > 2.log
→ ls >> 2.log && echo "Saving logs"
Saving logs

→ cat 2.log
Job id:
1.csv
1.log
1.txt

I'd put this echo statement a line before printf "Job ID: " > "$LOGFILE" (as there really writing to logs happen)

@@ -1,2 +1,7 @@
#!/bin/bash
sbatch {options_args:s} --array=1-{jobs_no:d} --output="{log_dir:s}/output_%j_%a.log" --error="{log_dir:s}/error_%j_%a.log" {script_path:s}

# Log file submit.log will be created in the same directory submit.sh is located
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a clear description what is the difference between submit.log and "{log_dir:s}/output_%j_%a.log" files. First one will contain stderr and stdout from sbatch command (including job id and probably job execution status). Second one contains stdout from the MC engine executable (doing our simulation).

@grzanka
Copy link
Collaborator

grzanka commented Sep 2, 2017

Concerning start and end time of a job - what about doing dumping such information at the beginning and end of https://github.com/DataMedSci/mcpartools/blob/master/mcpartools/scheduler/data/run_slurm.sh ?


# Log file submit.log will be created in the same directory submit.sh is located
LOGFILE="$(cd $(dirname $0) && pwd)/submit.log"
printf "Job ID: " > "$LOGFILE"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log also job submission time, at this place or after sbatch command.

echo "Saving logs to $LOGFILE"
printf "Job ID: " > "$LOGFILE"

sbatch {options_args:s} --array=1-{jobs_no:d} --output="{log_dir:s}/output_%j_%a.log" --error="{log_dir:s}/error_%j_%a.log" --parsable {script_path:s} | cut -d ";" -f 1 >> "$LOGFILE"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest following thing:

First writes sbatch stderr and stout to some temporary file (see i.e. mktemp command).
Then parse its output using cut. If job id was found, then save it to the log file, otherwise put there what was generated.

Move printf "Job ID" after sbatch command. Do not forget about stderr stream.

See i.e. the case whens batch fails due to wrong user arguments:

# plgkongruencj at login01.pro.cyfronet.pl in /net/scratch/people/plgkongruencj/mcpartools on git:feature/logging ✖︎ [20:33:02]
→ PYTHONPATH=. python mcpartools/generatemc.py -j 10 -p 100 tests/res/shieldhit -s "[--ttime=1:90:90]" -w aa

# plgkongruencj at login01.pro.cyfronet.pl in /net/scratch/people/plgkongruencj/mcpartools on git:feature/logging ✖︎ [20:33:34]
→ cat aa/run_20170902_203334/submit.sh 
#!/bin/bash

# Log file submit.log will be created in the same directory submit.sh is located
LOGFILE="$(cd $(dirname $0) && pwd)/submit.log"
echo "Saving logs to $LOGFILE"
printf "Job ID: " > "$LOGFILE"

sbatch --ttime=1:90:90 --array=1-10 --output="/net/scratch/people/plgkongruencj/mcpartools/aa/run_20170902_203334/workspace/log/output_%j_%a.log" --error="/net/scratch/people/plgkongruencj/mcpartools/aa/run_20170902_203334/workspace/log/error_%j_%a.log" --parsable /net/scratch/people/plgkongruencj/mcpartools/aa/run_20170902_203334/workspace/main_run.sh | cut -d ";" -f 1 >> "$LOGFILE"

printf "Submission time: `date +"%Y-%m-%d %H:%M:%S"`" >> "$LOGFILE"

# plgkongruencj at login01.pro.cyfronet.pl in /net/scratch/people/plgkongruencj/mcpartools on git:feature/logging ✖︎ [20:33:42]
→ ./aa/run_20170902_203334/submit.sh 
Saving logs to /net/scratch/people/plgkongruencj/mcpartools/aa/run_20170902_203334/submit.log
sbatch: unrecognized option '--ttime=1:90:90'
Try "sbatch --help" for more information

# plgkongruencj at login01.pro.cyfronet.pl in /net/scratch/people/plgkongruencj/mcpartools on git:feature/logging ✖︎ [20:33:55]
→ echo $?
0

# plgkongruencj at login01.pro.cyfronet.pl in /net/scratch/people/plgkongruencj/mcpartools on git:feature/logging ✖︎ [20:34:09]
→ cat /net/scratch/people/plgkongruencj/mcpartools/aa/run_20170902_203334/submit.log
Job ID: Submission time: 2017-09-02 20:33:54%   

@PanSzczerba
Copy link
Contributor Author

Instead of parsing both stdout and stderr I used seperated files for them and then log them. With this if some error will occur but command itself will end with success there will be no problems with parsing Job ID

cat $ERR >> "$LOGFILE"
fi

rm $OUT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if script is killed or sbatch crashes ?
Consider making trap like this one: https://stackoverflow.com/questions/687014/removing-created-temp-files-in-unexpected-bash-exit

LOGFILE="$(cd $(dirname $0) && pwd)/submit.log"
echo -n "" > "$LOGFILE"

OUT=`mktemp`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a bit of comment what is happening here. Code does much better job now, but starts to be much harder to read.

Sth like: "we create now two temporary files, which will store stderr and stdout of sbatch command. Their content will be used for furter analysis, i.e. extraction of jobid. Files will be removed on script exit (see trap command)"

@grzanka grzanka merged commit 80f976c into DataMedSci:master Sep 8, 2017
@PanSzczerba PanSzczerba deleted the feature/logging branch September 8, 2017 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants