Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement to scheduler logic and code refactoring. #1739

Merged
merged 10 commits into from
Apr 1, 2024

Conversation

shahzebsiddiqui
Copy link
Member

@shahzebsiddiqui shahzebsiddiqui commented Mar 28, 2024

This PR will make several enhancement and code refactoring to scheduling class.

Slurm job using scontrol show job

This PR will use scontrol show job to get output and error file instead of using sacct since we can use this information at job submission time to get the data. This will address an issue where we can't rely on extracting output from workdir if one specifies an alternative path such as sacct -o /tmp/job.out -e /tmp/job.err test.sh

Shown below is a sample build run with log enabled which shows scontrol show job is run

slurm_metadata/5ee906a4 does not have any dependencies adding test to queue
 Builders Eligible to Run
┏━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Builder                 ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ slurm_metadata/5ee906a4 │
└─────────────────────────┘
                    DEBUG    Changing to directory /global/u1/s/siddiq90/gitrepos/buildtest/var/tests/perlmutter.slurm.debug/metadata/slurm_metadata/5ee906a4/stage                                                                                                slurm.py:82
slurm_metadata/5ee906a4: Current Working Directory : /global/u1/s/siddiq90/gitrepos/buildtest/var/tests/perlmutter.slurm.debug/metadata/slurm_metadata/5ee906a4/stage
slurm_metadata/5ee906a4: Running Test via command: bash slurm_metadata_build.sh
[03/28/24 13:30:48] DEBUG    Running Test via command: bash slurm_metadata_build.sh                                                                                                                                                                                base.py:378
slurm_metadata/5ee906a4: JobID 23611383 dispatched to scheduler
[03/28/24 13:30:49] DEBUG    Querying JobID: '23611383' by running: 'scontrol show job 23611383'                                                                                                                                                                  slurm.py:196
                    DEBUG    Output of scontrol show job 23611383:                                                                                                                                                                                                slurm.py:199
                             JobId=23611383 JobName=slurm_metadata
                                 UserId=siddiq90(92503) GroupId=siddiq90(92503) MCS_label=N/A
                                 Priority=69119 Nice=0 Account=nstaff QOS=debug
                                 JobState=PENDING Reason=Resources Dependency=(null)
                                 Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
                                 RunTime=00:00:00 TimeLimit=00:01:00 TimeMin=N/A
                                 SubmitTime=2024-03-28T13:30:48 EligibleTime=2024-03-28T13:30:48
                                 AccrueTime=2024-03-28T13:30:48
                                 StartTime=Unknown EndTime=Unknown Deadline=N/A
                                 SuspendTime=None SecsPreSuspend=0 LastSchedEval=2024-03-28T13:30:48 Scheduler=Main
                                 Partition=regular_milan_ss11 AllocNode:Sid=login38:1876581
                                 ReqNodeList=(null) ExcNodeList=(null)
                                 NodeList=
                                 NumNodes=1-1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
                                 ReqTRES=cpu=1,mem=488002M,node=1,billing=1
                                 AllocTRES=(null)
                                 Socks/Node=* NtasksPerN🅱S:C=0:0:*:* CoreSpec=*
                                 MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
                                 Features=cpu DelayBoot=00:00:00
                                 OverSubscribe=NO Contiguous=0 Licenses=u1:1 Network=(null)
                                 Command=/global/u1/s/siddiq90/gitrepos/buildtest/var/tests/perlmutter.slurm.debug/metadata/slurm_metadata/5ee906a4/stage/slurm_metadata.sh
                                 WorkDir=/global/u1/s/siddiq90/gitrepos/buildtest/var/tests/perlmutter.slurm.debug/metadata/slurm_metadata/5ee906a4/stage
                                 StdErr=/global/u1/s/siddiq90/gitrepos/buildtest/var/tests/perlmutter.slurm.debug/metadata/slurm_metadata/5ee906a4/stage/slurm_metadata.err
                                 StdIn=/dev/null
                                 StdOut=/global/u1/s/siddiq90/gitrepos/buildtest/var/tests/perlmutter.slurm.debug/metadata/slurm_metadata/5ee906a4/stage/slurm_metadata.out
                                 Power=


                    DEBUG    Extracting StdOut file by applying regular expression: StdOut=(?P<stdout>.+)                                                                                                                                                         slurm.py:203
                    DEBUG    Extracting StdOut file by applying regular expression: StdErr=(?P<stderr>.+)                                                                                                                                                         slurm.py:213
                    DEBUG    Output File: /global/u1/s/siddiq90/gitrepos/buildtest/var/tests/perlmutter.slurm.debug/metadata/slurm_metadata/5ee906a4/stage/slurm_metadata.out                                                                                     slurm.py:221
                    DEBUG    Error File: /global/u1/s/siddiq90/gitrepos/buildtest/var/tests/perlmutter.slurm.debug/metadata/slurm_metadata/5ee906a4/stage/slurm_metadata.err                                                                                      slurm.py:222
                    DEBUG    slurm_metadata/5ee906a4: JobID 23611383 dispatched to scheduler                                                                                                                                                                      slurm.py:112
Polling Jobs in 5 seconds
[03/28/24 13:30:54] DEBUG    Querying JobID: '23611383' by running: 'sacct -j 23611383 -o State -n -X -P'                                                                                                                                                         slurm.py:137
                    DEBUG    JobID: '23611383' Job State: PENDING                                                                                                                                                                                                 slurm.py:142
                                       Pending and Suspended Jobs (1)
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ builder                 ┃ executor               ┃ jobid    ┃ jobstate ┃ runtime ┃ elapsedtime ┃ pendtime ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ slurm_metadata/5ee906a4 │ perlmutter.slurm.debug │ 23611383 │ PENDING  │ 12.029  │ 0           │ 5.84     │
└─────────────────────────┴────────────────────────┴──────────┴──────────┴─────────┴─────────────┴──────────┘

Fix bugs with Cobalt Scheduler

This PR will fix job submission issues related to Cobalt scheduler after code refactoring, we were able to submit job

(buildtest) ac.ssiddiqui@jlsebatch1:~/github/buildtest/tests/examples/jlse> buildtest -d build -b hostname.yml --pollinterval=5
[03/29/24 14:26:40] DEBUG    Starting System Compatibility Check                                                                                                                                system.py:44
                    INFO     Machine: x86_64                                                                                                                                                    system.py:61
                    INFO     Host: jlsebatch1                                                                                                                                                   system.py:62
                    INFO     User: ac.ssiddiqui                                                                                                                                                 system.py:63
                    INFO     Operating System: opensuse                                                                                                                                         system.py:64
                    INFO     System Kernel: Linux and Kernel Release: 5.14.21-150400.24.88-default                                                                                              system.py:65
                    INFO     Python Path: /home/ac.ssiddiqui/.pyenv/buildtest/bin/python3                                                                                                       system.py:68
                    INFO     Python Version: 3.8.12                                                                                                                                             system.py:69
                    INFO     BUILDTEST_ROOT: /home/ac.ssiddiqui/github/buildtest                                                                                                                system.py:70
                    INFO     Path to Buildtest: /home/ac.ssiddiqui/github/buildtest/bin/buildtest                                                                                               system.py:71
                    INFO     Detected module system: environment-modules                                                                                                                       system.py:111
                    INFO     Detected environment-modules with version: /usr/lib64/Modules/modulecmd.tcl                                                                                       system.py:112
                    DEBUG    We will check the following binaries ['sbatch', 'sacct', 'sacctmgr', 'sinfo', 'scancel'] for existence.                                                         detection.py:22
                    DEBUG    Cannot find sbatch command in $PATH                                                                                                                             detection.py:27
                    DEBUG    We will check the following binaries ['bsub', 'bqueues', 'bkill', 'bjobs'] for existence.                                                                       detection.py:22
                    DEBUG    Cannot find bsub command in $PATH                                                                                                                               detection.py:27
                    DEBUG    We will check the following binaries ['qsub', 'qstat', 'qdel', 'nodelist', 'showres', 'partlist'] for existence.                                                detection.py:22
                    DEBUG    qsub: /usr/bin/qsub                                                                                                                                             detection.py:30
                    DEBUG    qstat: /usr/bin/qstat                                                                                                                                           detection.py:30
                    DEBUG    qdel: /usr/bin/qdel                                                                                                                                             detection.py:30
                    DEBUG    nodelist: /usr/bin/nodelist                                                                                                                                     detection.py:30
                    DEBUG    showres: /usr/bin/showres                                                                                                                                       detection.py:30
                    DEBUG    partlist: /usr/bin/partlist                                                                                                                                     detection.py:30
[03/29/24 14:26:41] DEBUG    Get all Cobalt Queues by running qstat -Ql                                                                                                                     detection.py:210
                    DEBUG    Detected Cobalt Scheduler                                                                                                                                          system.py:88
                    DEBUG    We will check the following binaries ['qsub', 'qstat', 'qdel', 'qstart', 'qhold', 'qmgr'] for existence.                                                        detection.py:22
                    DEBUG    qsub: /usr/bin/qsub                                                                                                                                             detection.py:30
                    DEBUG    qstat: /usr/bin/qstat                                                                                                                                           detection.py:30
                    DEBUG    qdel: /usr/bin/qdel                                                                                                                                             detection.py:30
                    DEBUG    Cannot find qstart command in $PATH                                                                                                                             detection.py:27
                    DEBUG    We will check the following binaries ['qsub', 'qstat', 'qdel', 'qstart', 'qhold', 'qmgr'] for existence.                                                        detection.py:22
                    DEBUG    qsub: /usr/bin/qsub                                                                                                                                             detection.py:30
                    DEBUG    qstat: /usr/bin/qstat                                                                                                                                           detection.py:30
                    DEBUG    qdel: /usr/bin/qdel                                                                                                                                             detection.py:30
                    DEBUG    Cannot find qstart command in $PATH                                                                                                                             detection.py:27
                    INFO     Finished System Compatibility Check                                                                                                                                system.py:76
                    DEBUG    List of available systems: ['jlse'] found in configuration file                                                                                                   config.py:102
                    DEBUG    Checking hostname: jlsebatch1 in system: 'jlse' with hostnames: ['^jlsebatch\\d{1}$']                                                                             config.py:117
                    INFO     Found matching system: jlse based on hostname: jlsebatch1                                                                                                         config.py:124
                    DEBUG    Loading default settings schema: /home/ac.ssiddiqui/github/buildtest/buildtest/schemas/settings.schema.json                                                       config.py:143
                    DEBUG    Successfully loaded schema file: /home/ac.ssiddiqui/github/buildtest/buildtest/schemas/settings.schema.json                                                         utils.py:41
                    DEBUG    Validating configuration file with schema: /home/ac.ssiddiqui/github/buildtest/buildtest/schemas/settings.schema.json                                             config.py:146
                    DEBUG    Validation was successful                                                                                                                                         config.py:154
                    DEBUG    We will check the following binaries ['qsub', 'qstat', 'qdel', 'nodelist', 'showres', 'partlist'] for existence.                                                detection.py:22
                    DEBUG    qsub: /usr/bin/qsub                                                                                                                                             detection.py:30
                    DEBUG    qstat: /usr/bin/qstat                                                                                                                                           detection.py:30
                    DEBUG    qdel: /usr/bin/qdel                                                                                                                                             detection.py:30
                    DEBUG    nodelist: /usr/bin/nodelist                                                                                                                                     detection.py:30
                    DEBUG    showres: /usr/bin/showres                                                                                                                                       detection.py:30
                    DEBUG    partlist: /usr/bin/partlist                                                                                                                                     detection.py:30
                    DEBUG    Get all Cobalt Queues by running qstat -Ql                                                                                                                     detection.py:210
                    INFO     Processing buildtest configuration file: /home/ac.ssiddiqui/github/buildtest/tests/settings/jlse.yml                                                                main.py:149
                    DEBUG    Tests will be written in /home/ac.ssiddiqui/github/buildtest/var/tests                                                                                             build.py:792
                    DEBUG    Getting Executors from buildtest settings                                                                                                                           setup.py:89
╭─────────────────────────────────────────────── buildtest summary ───────────────────────────────────────────────╮
│                                                                                                                 │
│ User:               ac.ssiddiqui                                                                                │
│ Hostname:           jlsebatch1                                                                                  │
│ Platform:           Linux                                                                                       │
│ Current Time:       2024/03/29 14:26:41                                                                         │
│ buildtest path:     /home/ac.ssiddiqui/github/buildtest/bin/buildtest                                           │
│ buildtest version:  1.8                                                                                         │
│ python path:        /home/ac.ssiddiqui/.pyenv/buildtest/bin/python3                                             │
│ python version:     3.8.12                                                                                      │
│ Configuration File: /home/ac.ssiddiqui/github/buildtest/tests/settings/jlse.yml                                 │
│ Test Directory:     /home/ac.ssiddiqui/github/buildtest/var/tests                                               │
│ Report File:        /home/ac.ssiddiqui/github/buildtest/var/report.json                                         │
│ Command:            /home/ac.ssiddiqui/github/buildtest/bin/buildtest -d build -b hostname.yml --pollinterval=5 │
│                                                                                                                 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
                    DEBUG    Discovering buildspecs based on tags=None, executor=None, buildspec=['hostname.yml'], excluded buildspec=None                                                      build.py:148
                    DEBUG    Buildspec: hostname.yml is a file                                                                                                                                  build.py:559
                    INFO     Based on input argument we discovered the following buildspecs: ['/home/ac.ssiddiqui/github/buildtest/tests/examples/jlse/hostname.yml']                           build.py:571
                    DEBUG    buildtest discovered the following Buildspecs: ['/home/ac.ssiddiqui/github/buildtest/tests/examples/jlse/hostname.yml']                                            build.py:227
─────────────────────────────────────────────────────────────────────────────────────────  Discovering Buildspecs ──────────────────────────────────────────────────────────────────────────────────────────
                         Discovered buildspecs
╔══════════════════════════════════════════════════════════════════════╗
║ buildspec                                                            ║
╟──────────────────────────────────────────────────────────────────────╢
║ /home/ac.ssiddiqui/github/buildtest/tests/examples/jlse/hostname.yml ║
╟──────────────────────────────────────────────────────────────────────╢
║ Total: 1                                                             ║
╚══════════════════════════════════════════════════════════════════════╝


Total Discovered Buildspecs:  1
Total Excluded Buildspecs:  0
Detected Buildspecs after exclusion:  1
──────────────────────────────────────────────────────────────────────────────────────────── Parsing Buildspecs ────────────────────────────────────────────────────────────────────────────────────────────
                    INFO     Validating /home/ac.ssiddiqui/github/buildtest/tests/examples/jlse/hostname.yml with schema:                                                                      parser.py:164
                             /home/ac.ssiddiqui/github/buildtest/buildtest/schemas/global.schema.json
                    INFO     Validating test - 'hostname_test' in recipe: /home/ac.ssiddiqui/github/buildtest/tests/examples/jlse/hostname.yml                                                 parser.py:176
                    INFO     Test: 'hostname_test' is using schema type: 'script'                                                                                                              parser.py:118
                    INFO     Validating /home/ac.ssiddiqui/github/buildtest/tests/examples/jlse/hostname.yml with schema:                                                                      parser.py:193
                             /home/ac.ssiddiqui/github/buildtest/buildtest/schemas/script.schema.json
                    DEBUG    Searching for builders for test: hostname_test by applying regular expression with available builders: ['jlse.local.bash', 'jlse.local.sh', 'jlse.local.csh',   builders.py:272
                             'jlse.local.python', 'jlse.cobalt.iris']
                    DEBUG    Found a match in buildspec with available executors via re.fullmatch(jlse.cobalt.iris,jlse.cobalt.iris)                                                         builders.py:280
                    DEBUG    Processing Buildspec File: /home/ac.ssiddiqui/github/buildtest/tests/examples/jlse/hostname.yml                                                                     base.py:144
                    DEBUG    Processing Test: hostname_test                                                                                                                                      base.py:145
                    DEBUG    Using shell bash                                                                                                                                                    base.py:181
                    DEBUG    Shebang used for test: #!/usr/bin/bash                                                                                                                              base.py:182
Valid Buildspecs: 1
Invalid Buildspecs: 0
/home/ac.ssiddiqui/github/buildtest/tests/examples/jlse/hostname.yml: VALID
Total builder objects created: 1
                                                                              Builders by type=script
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ builder                ┃ type   ┃ executor         ┃ compiler ┃ nodes ┃ procs ┃ description               ┃ buildspecs                                                           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ hostname_test/b740b9de │ script │ jlse.cobalt.iris │ None     │ None  │ None  │ Run hostname as batch job │ /home/ac.ssiddiqui/github/buildtest/tests/examples/jlse/hostname.yml │
└────────────────────────┴────────┴──────────────────┴──────────┴───────┴───────┴───────────────────────────┴──────────────────────────────────────────────────────────────────────┘
                                                 Batch Job Builders
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ builder                ┃ executor         ┃ buildspecs                                                           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ hostname_test/b740b9de │ jlse.cobalt.iris │ /home/ac.ssiddiqui/github/buildtest/tests/examples/jlse/hostname.yml │
└────────────────────────┴──────────────────┴──────────────────────────────────────────────────────────────────────┘
────────────────────────────────────────────────────────────────────────────────────────────── Building Test ───────────────────────────────────────────────────────────────────────────────────────────────
                    DEBUG    Creating test directory: /home/ac.ssiddiqui/github/buildtest/var/tests/jlse.cobalt.iris/hostname/hostname_test/b740b9de                                             base.py:527
                    DEBUG    Creating the stage directory: /home/ac.ssiddiqui/github/buildtest/var/tests/jlse.cobalt.iris/hostname/hostname_test/b740b9de/stage                                  base.py:536
hostname_test/b740b9de: Creating Test Directory: /home/ac.ssiddiqui/github/buildtest/var/tests/jlse.cobalt.iris/hostname/hostname_test/b740b9de
                    INFO     Opening Test File for Writing: /home/ac.ssiddiqui/github/buildtest/var/tests/jlse.cobalt.iris/hostname/hostname_test/b740b9de/stage/hostname_test.sh                base.py:658
                    DEBUG    Changing permission to 755 for script: /home/ac.ssiddiqui/github/buildtest/var/tests/jlse.cobalt.iris/hostname/hostname_test/b740b9de/stage/hostname_test.sh        base.py:856
                    DEBUG    Writing build script: /home/ac.ssiddiqui/github/buildtest/var/tests/jlse.cobalt.iris/hostname/hostname_test/b740b9de/stage/hostname_test_build.sh                   base.py:631
                    DEBUG    Changing permission to 755 for script: /home/ac.ssiddiqui/github/buildtest/var/tests/jlse.cobalt.iris/hostname/hostname_test/b740b9de/stage/hostname_test_build.sh  base.py:856
                    DEBUG    Copying build script to: /home/ac.ssiddiqui/github/buildtest/var/tests/jlse.cobalt.iris/hostname/hostname_test/b740b9de/hostname_test_build.sh                      base.py:637
────────────────────────────────────────────────────────────────────────────────────────────── Running Tests ───────────────────────────────────────────────────────────────────────────────────────────────
Spawning 8 processes for processing builders
─────────────────────────────────────────────────────────────────────────────────────────────── Iteration 1 ────────────────────────────────────────────────────────────────────────────────────────────────
hostname_test/b740b9de does not have any dependencies adding test to queue
 Builders Eligible to Run
┏━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Builder                ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━┩
│ hostname_test/b740b9de │
└────────────────────────┘
hostname_test/b740b9de: Current Working Directory : /home/ac.ssiddiqui/github/buildtest/var/tests/jlse.cobalt.iris/hostname/hostname_test/b740b9de/stage
hostname_test/b740b9de: Running Test via command: bash hostname_test_build.sh
                    DEBUG    Running Test via command: bash hostname_test_build.sh                                                                                                               base.py:378
hostname_test/b740b9de: JobID: 763827 dispatched to scheduler
                    DEBUG    hostname_test/b740b9de: JobID: 763827 dispatched to scheduler                                                                                                      cobalt.py:93
                    DEBUG    Output file will be written to: /home/ac.ssiddiqui/github/buildtest/var/tests/jlse.cobalt.iris/hostname/hostname_test/b740b9de/stage/763827.output                cobalt.py:106
                    DEBUG    Error file will be written to: /home/ac.ssiddiqui/github/buildtest/var/tests/jlse.cobalt.iris/hostname/hostname_test/b740b9de/stage/763827.error                  cobalt.py:107
                    DEBUG    Executing command: qstat -lf 763827                                                                                                                               cobalt.py:101
[03/29/24 14:26:42] DEBUG    {                                                                                                                                                                 cobalt.py:112
                               "JobID": "763827",
                               "JobName": "hostname_test",
                               "User": "ac.ssiddiqui",
                               "WallTime": "00:10:00",
                               "QueuedTime": "00:00:00",
                               "RunTime": "N/A",
                               "TimeRemaining": "N/A",
                               "Nodes": "1",
                               "State": "queued",
                               "Location": "None",
                               "Mode": "script",
                               "Procs": "1",
                               "Preemptable": "False",
                               "User_Hold": "False",
                               "Admin_Hold": "False",
                               "Queue": "iris",
                               "StartTime": "N/A",
                               "Index": "None",
                               "SubmitTime": "Fri Mar 29 14:26:41 2024 +0000 (UTC)",
                               "Path":
                             "/home/ac.ssiddiqui/github/buildtest/bin:/home/ac.ssiddiqui/.pyenv/buildtest/bin:/home/ac.ssiddiqui/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/lpp/mmfs/bin:/ho
                             me/ac.ssiddiqui/.local/bin:/home/ac.ssiddiqui/bin",
                               "OutputDir": "/home/ac.ssiddiqui/github/buildtest/var/tests/jlse.cobalt.iris/hostname/hostname_test/b740b9de/stage",
                               "ErrorPath": "None",
                               "OutputPath": "None",
                               "Envs": "",
                               "Command": "/home/ac.ssiddiqui/github/buildtest/var/tests/jlse.cobalt.iris/hostname/hostname_test/b740b9de/stage/hostname_test.sh",
                               "Args": "",
                               "Kernel": "default",
                               "KernelOptions": "None",
                               "ION_Kernel": "default",
                               "ION_KernelOptions": "None",
                               "Project": "None",
                               "Dependencies": "",
                               "S": "Q",
                               "Notify": "None",
                               "Score": "0.1",
                               "Maxtasktime": "None",
                               "attrs": "{}",
                               "dep_frac": "None",
                               "user_list": "ac.ssiddiqui",
                               "Geometry": "Any"
                             }
Polling Jobs in 5 seconds
[03/29/24 14:26:47] DEBUG    Getting Job State for '763827' by running: 'qstat -l --header State 763827'                                                                                        cobalt.py:63
                    DEBUG    Job ID: '763827' Job State: starting                                                                                                                               cobalt.py:76
                                   Pending and Suspended Jobs (1)
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ builder                ┃ executor         ┃ jobid  ┃ jobstate ┃ runtime ┃ elapsedtime ┃ pendtime ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ hostname_test/b740b9de │ jlse.cobalt.iris │ 763827 │ starting │ 5.672   │ 0           │ 5.36     │
└────────────────────────┴──────────────────┴────────┴──────────┴─────────┴─────────────┴──────────┘
Polling Jobs in 5 seconds
[03/29/24 14:26:52] DEBUG    Getting Job State for '763827' by running: 'qstat -l --header State 763827'                                                                                        cobalt.py:63
                    DEBUG    Job ID: '763827' Job State: starting                                                                                                                               cobalt.py:76
                                   Pending and Suspended Jobs (1)
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ builder                ┃ executor         ┃ jobid  ┃ jobstate ┃ runtime ┃ elapsedtime ┃ pendtime ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ hostname_test/b740b9de │ jlse.cobalt.iris │ 763827 │ starting │ 10.851  │ 0           │ 10.53    │
└────────────────────────┴──────────────────┴────────┴──────────┴─────────┴─────────────┴──────────┘
Polling Jobs in 5 seconds
[03/29/24 14:26:57] DEBUG    Getting Job State for '763827' by running: 'qstat -l --header State 763827'                                                                                        cobalt.py:63
                    DEBUG    Job ID: '763827' Job State: running                                                                                                                                cobalt.py:76
                                          Running Jobs (1)
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ builder                ┃ executor         ┃ jobid  ┃ jobstate ┃ runtime ┃ elapsedtime ┃ pendtime ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ hostname_test/b740b9de │ jlse.cobalt.iris │ 763827 │ running  │ 16.03   │ 0.0         │ 10.53    │
└────────────────────────┴──────────────────┴────────┴──────────┴─────────┴─────────────┴──────────┘
Polling Jobs in 5 seconds
[03/29/24 14:27:02] DEBUG    Getting Job State for '763827' by running: 'qstat -l --header State 763827'                                                                                        cobalt.py:63
                    DEBUG    Job ID: '763827' Job State: running                                                                                                                                cobalt.py:76
                                          Running Jobs (1)
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ builder                ┃ executor         ┃ jobid  ┃ jobstate ┃ runtime ┃ elapsedtime ┃ pendtime ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ hostname_test/b740b9de │ jlse.cobalt.iris │ 763827 │ running  │ 21.212  │ 5.18        │ 10.53    │
└────────────────────────┴──────────────────┴────────┴──────────┴─────────┴─────────────┴──────────┘
Polling Jobs in 5 seconds
[03/29/24 14:27:07] DEBUG    Getting Job State for '763827' by running: 'qstat -l --header State 763827'                                                                                        cobalt.py:63
                    DEBUG    Job ID: '763827' Job State: exiting                                                                                                                                cobalt.py:76
                    DEBUG    Sleeping 5 seconds and waiting for Cobalt Scheduler to write output and error file                                                                                cobalt.py:166
[03/29/24 14:27:12] DEBUG    Sleeping 5 seconds and waiting for Cobalt Scheduler to write output and error file                                                                                cobalt.py:166
[03/29/24 14:27:17] DEBUG    Cobalt Log File written to /home/ac.ssiddiqui/github/buildtest/var/tests/jlse.cobalt.iris/hostname/hostname_test/b740b9de/stage/763827.cobaltlog                  cobalt.py:176
                    DEBUG    Test: hostname_test got returncode: 0 from JobID: 763827                                                                                                          cobalt.py:186
                    DEBUG    Copying cobalt log file: /home/ac.ssiddiqui/github/buildtest/var/tests/jlse.cobalt.iris/hostname/hostname_test/b740b9de/stage/763827.cobaltlog to                 cobalt.py:197
                             /home/ac.ssiddiqui/github/buildtest/var/tests/jlse.cobalt.iris/hostname/hostname_test/b740b9de/763827.cobaltlog
hostname_test/b740b9de: Job 763827 is complete!
hostname_test/b740b9de: Test completed in 5.18 seconds with returncode: 0
hostname_test/b740b9de: Writing output file -  /home/ac.ssiddiqui/github/buildtest/var/tests/jlse.cobalt.iris/hostname/hostname_test/b740b9de/763827.output
hostname_test/b740b9de: Writing error file - /home/ac.ssiddiqui/github/buildtest/var/tests/jlse.cobalt.iris/hostname/hostname_test/b740b9de/763827.error
                                         Completed Jobs (1)
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ builder                ┃ executor         ┃ jobid  ┃ jobstate ┃ runtime ┃ elapsedtime ┃ pendtime ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ hostname_test/b740b9de │ jlse.cobalt.iris │ 763827 │ exiting  │ 5.18    │ 5.18        │ 10.53    │
└────────────────────────┴──────────────────┴────────┴──────────┴─────────┴─────────────┴──────────┘
                                Test Summary
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┓
┃ builder                ┃ executor         ┃ status ┃ returncode ┃ runtime ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━┩
│ hostname_test/b740b9de │ jlse.cobalt.iris │ PASS   │ 0          │ 5.180   │
└────────────────────────┴──────────────────┴────────┴────────────┴─────────┘



Passed Tests: 1/1 Percentage: 100.000%
Failed Tests: 0/1 Percentage: 0.000%


                    DEBUG    Updating report file: /home/ac.ssiddiqui/github/buildtest/var/report.json                                                                                         build.py:1721
Adding 1 test results to report file: /home/ac.ssiddiqui/github/buildtest/var/report.json
Writing Logfile to /home/ac.ssiddiqui/github/buildtest/var/logs/buildtest_99oyef_a.log

…output/error file and exitcode that

can be used by subclasses
add logic for extracting output and error file in slurm using 'scontrol show job' at job submission time which
will be useful for extracting the output.
Add method get_output_and_error_files in base class Job that will be used by subclass to implement how each scheduler will extract
output and error files
Copy link

codecov bot commented Mar 28, 2024

Codecov Report

Attention: Patch coverage is 15.09434% with 45 lines in your changes are missing coverage. Please review.

Project coverage is 33.88%. Comparing base (9c9b820) to head (decaced).
Report is 25 commits behind head on devel.

Files Patch % Lines
buildtest/scheduler/slurm.py 4.35% 22 Missing ⚠️
buildtest/scheduler/lsf.py 13.33% 13 Missing ⚠️
buildtest/scheduler/job.py 40.00% 6 Missing ⚠️
buildtest/executors/slurm.py 0.00% 3 Missing ⚠️
buildtest/scheduler/pbs.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##            devel    #1739       +/-   ##
===========================================
- Coverage   80.85%   33.88%   -46.97%     
===========================================
  Files          57       58        +1     
  Lines        6453     6467       +14     
===========================================
- Hits         5217     2191     -3026     
- Misses       1236     4276     +3040     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

shahzebsiddiqui and others added 8 commits March 28, 2024 13:31
…e job data

that is implemented by subclass used for retrieving job record upon completion.
The method jobdata will return job record which will be stored in internal variable self._jobdata.

We renamed the methods in each subclass to use retrieve_jobdata to get the job records.
Finally in each executor class we will invoke builder.job.jobdata() method to get the job data instead of
having a return value from method which helps clean up code
@shahzebsiddiqui shahzebsiddiqui merged commit 5f5a33a into devel Apr 1, 2024
36 of 38 checks passed
@shahzebsiddiqui shahzebsiddiqui deleted the scheduler_class_improvement branch April 1, 2024 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant