Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix bug with file check with slurm jobs #1331

Merged
merged 6 commits into from
Jan 6, 2023
Merged

Conversation

shahzebsiddiqui
Copy link
Member

@shahzebsiddiqui shahzebsiddiqui commented Jan 5, 2023

Thanks to @wyphan we have found an issue where Slurm jobs will cause current working directory to be changed and it doesn't preserve the stage directory this causes issues where file checks are not done correctly since relative path is not stage directory.

An initial prototype of this was done to ensure test was working with file checks with slurm job

Here is an example buildspec

(buildtest)  ~/gitrepos/buildtest/ [debug_file_checks] cat examples/pm.yml
buildspecs:
  status_exists_pm_slurm:
   type: script
   executor: perlmutter.slurm.debug
   description: status check based for file and directory
   sbatch: ['-n 1', '-t 5', '-C cpu']
   run: |
     mkdir -p cuda_vecadd/test
     env | grep SLURM_*
   status:
     exists:
       - cuda_vecadd
       - cuda_vecadd/test

I was able to build this successfully

(buildtest)  ~/gitrepos/buildtest/examples/ [debug_file_checks*] buildtest bd -b pm.yml
╭──────────────────────────────────────── buildtest summary ────────────────────────────────────────╮
│                                                                                                   │
│ User:               siddiq90                                                                      │
│ Hostname:           login20                                                                       │
│ Platform:           Linux                                                                         │
│ Current Time:       2023/01/05 13:44:33                                                           │
│ buildtest path:     /global/homes/s/siddiq90/gitrepos/buildtest/bin/buildtest                     │
│ buildtest version:  1.0                                                                           │
│ python path:        /global/u1/s/siddiq90/.local/share/virtualenvs/buildtest-WqshQcL1/bin/python3 │
│ python version:     3.9.7                                                                         │
│ Configuration File: /global/u1/s/siddiq90/gitrepos/buildtest-nersc/config.yml                     │
│ Test Directory:     /global/u1/s/siddiq90/gitrepos/buildtest/var/tests                            │
│ Report File:        /global/u1/s/siddiq90/gitrepos/buildtest/var/report.json                      │
│ Command:            /global/homes/s/siddiq90/gitrepos/buildtest/bin/buildtest bd -b pm.yml        │
│                                                                                                   │
╰───────────────────────────────────────────────────────────────────────────────────────────────────╯
───────────────────────────────────────────────────────────────────────────────────────────────  Discovering Buildspecs ────────────────────────────────────────────────────────────────────────────────────────────────
                   Discovered buildspecs
╔══════════════════════════════════════════════════════════╗
║ buildspec                                                ║
╟──────────────────────────────────────────────────────────╢
║ /global/u1/s/siddiq90/gitrepos/buildtest/examples/pm.yml ║
╚══════════════════════════════════════════════════════════╝


Total Discovered Buildspecs:  1
Total Excluded Buildspecs:  0
Detected Buildspecs after exclusion:  1
────────────────────────────────────────────────────────────────────────────────────────────────── Parsing Buildspecs ──────────────────────────────────────────────────────────────────────────────────────────────────
Valid Buildspecs: 1
Invalid Buildspecs: 0
/global/u1/s/siddiq90/gitrepos/buildtest/examples/pm.yml: VALID
Total builder objects created: 1
                                                                                        Builders by type=script
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ builder                         ┃ type   ┃ executor               ┃ compiler ┃ nodes ┃ procs ┃ description                               ┃ buildspecs                                               ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ status_exists_pm_slurm/db3e1760 │ script │ perlmutter.slurm.debug │ None     │ None  │ None  │ status check based for file and directory │ /global/u1/s/siddiq90/gitrepos/buildtest/examples/pm.yml │
└─────────────────────────────────┴────────┴────────────────────────┴──────────┴───────┴───────┴───────────────────────────────────────────┴──────────────────────────────────────────────────────────┘
                                                  Batch Job Builders
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ builder                         ┃ executor               ┃ buildspecs                                               ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ status_exists_pm_slurm/db3e1760 │ perlmutter.slurm.debug │ /global/u1/s/siddiq90/gitrepos/buildtest/examples/pm.yml │
└─────────────────────────────────┴────────────────────────┴──────────────────────────────────────────────────────────┘
──────────────────────────────────────────────────────────────────────────────────────────────────── Building Test ─────────────────────────────────────────────────────────────────────────────────────────────────────
status_exists_pm_slurm/db3e1760: Creating test directory: /global/u1/s/siddiq90/gitrepos/buildtest/var/tests/perlmutter.slurm.debug/pm/status_exists_pm_slurm/db3e1760
status_exists_pm_slurm/db3e1760: Creating the stage directory: /global/u1/s/siddiq90/gitrepos/buildtest/var/tests/perlmutter.slurm.debug/pm/status_exists_pm_slurm/db3e1760/stage
status_exists_pm_slurm/db3e1760: Writing build script: /global/u1/s/siddiq90/gitrepos/buildtest/var/tests/perlmutter.slurm.debug/pm/status_exists_pm_slurm/db3e1760/status_exists_pm_slurm_build.sh
──────────────────────────────────────────────────────────────────────────────────────────────────── Running Tests ─────────────────────────────────────────────────────────────────────────────────────────────────────
Spawning 256 processes for processing builders
───────────────────────────────────────────────────────────────────────────────────────────────────── Iteration 1 ──────────────────────────────────────────────────────────────────────────────────────────────────────
status_exists_pm_slurm/db3e1760 does not have any dependencies adding test to queue
In this iteration we are going to run the following tests: [status_exists_pm_slurm/db3e1760]
status_exists_pm_slurm/db3e1760: Current Working Directory : /global/u1/s/siddiq90/gitrepos/buildtest/var/tests/perlmutter.slurm.debug/pm/status_exists_pm_slurm/db3e1760/stage
status_exists_pm_slurm/db3e1760: Running Test via command: bash --norc --noprofile -eo pipefail status_exists_pm_slurm_build.sh
status_exists_pm_slurm/db3e1760: JobID 4344691 dispatched to scheduler
Polling Jobs in 30 seconds
status_exists_pm_slurm/db3e1760: Job 4344691 is complete!
status_exists_pm_slurm/db3e1760 workdir:  /global/u1/s/siddiq90/gitrepos/buildtest/examples
status_exists_pm_slurm/db3e1760 Changing directory to  /global/u1/s/siddiq90/gitrepos/buildtest/var/tests/perlmutter.slurm.debug/pm/status_exists_pm_slurm/db3e1760/stage
status_exists_pm_slurm/db3e1760: Test completed in 30.175633 seconds
status_exists_pm_slurm/db3e1760: Test completed with returncode: 0
status_exists_pm_slurm/db3e1760: Writing output file -  /global/u1/s/siddiq90/gitrepos/buildtest/var/tests/perlmutter.slurm.debug/pm/status_exists_pm_slurm/db3e1760/status_exists_pm_slurm.out
status_exists_pm_slurm/db3e1760: Writing error file - /global/u1/s/siddiq90/gitrepos/buildtest/var/tests/perlmutter.slurm.debug/pm/status_exists_pm_slurm/db3e1760/status_exists_pm_slurm.err
status_exists_pm_slurm/db3e1760 working directory is /global/u1/s/siddiq90/gitrepos/buildtest/var/tests/perlmutter.slurm.debug/pm/status_exists_pm_slurm/db3e1760/stage
status_exists_pm_slurm/db3e1760: Test all files:  ['cuda_vecadd', 'cuda_vecadd/test']  existences
status_exists_pm_slurm/db3e1760: file: /global/u1/s/siddiq90/gitrepos/buildtest/var/tests/perlmutter.slurm.debug/pm/status_exists_pm_slurm/db3e1760/stage/cuda_vecadd exists
status_exists_pm_slurm/db3e1760: file: /global/u1/s/siddiq90/gitrepos/buildtest/var/tests/perlmutter.slurm.debug/pm/status_exists_pm_slurm/db3e1760/stage/cuda_vecadd/test exists
status_exists_pm_slurm/db3e1760: Exist Check: True
                                        Completed Jobs
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ builder                         ┃ executor               ┃ jobid   ┃ jobstate  ┃ runtime   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━┩
│ status_exists_pm_slurm/db3e1760 │ perlmutter.slurm.debug │ 4344691 │ COMPLETED │ 30.175633 │
└─────────────────────────────────┴────────────────────────┴─────────┴───────────┴───────────┘
                                                            Test Summary
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ builder                         ┃ executor               ┃ status ┃ checks (ReturnCode, Regex, Runtime) ┃ returncode ┃ runtime   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ status_exists_pm_slurm/db3e1760 │ perlmutter.slurm.debug │ PASS   │ False False False                   │ 0          │ 30.175633 │
└─────────────────────────────────┴────────────────────────┴────────┴─────────────────────────────────────┴────────────┴───────────┘



Passed Tests: 1/1 Percentage: 100.000%
Failed Tests: 0/1 Percentage: 0.000%


Adding 1 test results to /global/u1/s/siddiq90/gitrepos/buildtest/var/report.json
Writing Logfile to: /global/u1/s/siddiq90/gitrepos/buildtest/var/logs/buildtest_3qswtgx5.log

I can confirm the directory is created in stage directory

(buildtest)  ~/gitrepos/buildtest/ [debug_file_checks] ls -ld $(buildtest path status_exists_pm_slurm)/stage/cuda_vecadd/test
drwxrwx--- 2 siddiq90 siddiq90 512 Jan  5 13:44 /global/u1/s/siddiq90/gitrepos/buildtest/var/tests/perlmutter.slurm.debug/pm/status_exists_pm_slurm/db3e1760/stage/cuda_vecadd/test

@wyphan
Copy link

wyphan commented Jan 5, 2023

Confirmed working with hpctoolkit_cuda test on perlmutter.slurm.debug executor. I think the PR just needs to be blackened.

@codecov
Copy link

codecov bot commented Jan 6, 2023

Codecov Report

Base: 80.21% // Head: 71.21% // Decreases project coverage by -9.01% ⚠️

Coverage data is based on head (f0386e9) compared to base (8ea1a4e).
Patch coverage: 25.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##            devel    #1331      +/-   ##
==========================================
- Coverage   80.21%   71.21%   -9.01%     
==========================================
  Files          56       57       +1     
  Lines        6060     6106      +46     
  Branches     1122     1089      -33     
==========================================
- Hits         4861     4348     -513     
- Misses       1198     1756     +558     
- Partials        1        2       +1     
Impacted Files Coverage Δ
buildtest/system.py 54.50% <0.00%> (-24.29%) ⬇️
buildtest/buildsystem/checks.py 15.62% <15.62%> (ø)
buildtest/builders/base.py 44.13% <23.53%> (-12.84%) ⬇️
buildtest/buildsystem/parser.py 100.00% <100.00%> (ø)
buildtest/cli/config.py 86.76% <100.00%> (ø)
buildtest/scheduler/lsf.py 24.32% <0.00%> (-72.97%) ⬇️
buildtest/scheduler/slurm.py 28.95% <0.00%> (-67.11%) ⬇️
buildtest/cli/compilers.py 27.59% <0.00%> (-55.17%) ⬇️
buildtest/executors/slurm.py 21.25% <0.00%> (-51.25%) ⬇️
buildtest/executors/lsf.py 20.00% <0.00%> (-40.00%) ⬇️
... and 11 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants