Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add slurm runner #799

Merged
merged 82 commits into from
Apr 2, 2020
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
3434eda
Add initial slurm runner
JayjeetAtGithub Mar 17, 2020
c2b68ab
Fix pep8 issue
JayjeetAtGithub Mar 20, 2020
02054cb
Remove host slurm runner
JayjeetAtGithub Mar 21, 2020
523f845
Fix engine-conf test
JayjeetAtGithub Mar 21, 2020
6e47ac0
Add slurm test
JayjeetAtGithub Mar 21, 2020
4003e4a
Fix slurm tests
JayjeetAtGithub Mar 21, 2020
18c323c
Add entry to run_tests.sh
JayjeetAtGithub Mar 21, 2020
651e5d5
Modify info statement
JayjeetAtGithub Mar 21, 2020
bad354f
Fix process clearing step
JayjeetAtGithub Mar 21, 2020
59045e9
Pin pyhcl to 0.4.0
JayjeetAtGithub Mar 21, 2020
a22b29e
Fix utils test
JayjeetAtGithub Mar 21, 2020
a8b9094
Add slurm runner unit tests
JayjeetAtGithub Mar 22, 2020
e9ffd15
Remove slurm integration tests
JayjeetAtGithub Mar 23, 2020
adbcd42
Extract exec_cmd into utils
JayjeetAtGithub Mar 23, 2020
1fbd1c1
Fixes
JayjeetAtGithub Mar 23, 2020
4618be5
Add PopperConfigParser wrapper
JayjeetAtGithub Mar 23, 2020
1913342
Fix unit tests
JayjeetAtGithub Mar 23, 2020
1b74f3d
Extract create container logic into a staticmethod
JayjeetAtGithub Mar 24, 2020
380873a
fixes
JayjeetAtGithub Mar 24, 2020
6e61aa3
Fix pep8 issues
JayjeetAtGithub Mar 24, 2020
04e4a07
refactoring
JayjeetAtGithub Mar 24, 2020
d294112
Check for empty config file
JayjeetAtGithub Mar 24, 2020
1737992
Refactor popper config parsing
JayjeetAtGithub Mar 24, 2020
6ca3921
Add unit test for PopperConfig
JayjeetAtGithub Mar 24, 2020
d6cefaf
Add support for options in srun
JayjeetAtGithub Mar 25, 2020
d36ad8c
Changes
JayjeetAtGithub Mar 25, 2020
353a3b3
Changes
JayjeetAtGithub Mar 25, 2020
fc37f68
Changes
JayjeetAtGithub Mar 25, 2020
bf65ada
Add test
JayjeetAtGithub Mar 25, 2020
5919d20
Change config logic
JayjeetAtGithub Mar 25, 2020
d80537d
Refactoring
JayjeetAtGithub Mar 25, 2020
a2bb142
PEP8 fixes
JayjeetAtGithub Mar 25, 2020
41b2f02
pep8 error fixes
JayjeetAtGithub Mar 25, 2020
7f65a91
Fix unit tests
JayjeetAtGithub Mar 25, 2020
8054ae8
pep8 fixes
JayjeetAtGithub Mar 25, 2020
8259680
Changes in cli
JayjeetAtGithub Mar 25, 2020
af89d6f
fix comment
JayjeetAtGithub Mar 26, 2020
acead5b
extract engine config generation
JayjeetAtGithub Mar 26, 2020
3f6f8d5
change implmentation of slurm docker runner
JayjeetAtGithub Mar 26, 2020
87fe954
Add options support for sbatch
JayjeetAtGithub Mar 26, 2020
87b2721
Fix submit batch job fn
JayjeetAtGithub Mar 26, 2020
334ec0e
Move docker container anyway
JayjeetAtGithub Mar 27, 2020
6a9a438
refactor config
JayjeetAtGithub Mar 27, 2020
c2f03f9
Experimental
JayjeetAtGithub Mar 27, 2020
1d12839
FixeS
JayjeetAtGithub Mar 27, 2020
3fb9a2d
Fixes
JayjeetAtGithub Mar 27, 2020
d80f5b8
Fixes
JayjeetAtGithub Mar 27, 2020
9cf9929
Fixes
JayjeetAtGithub Mar 27, 2020
91b3d21
Add support for list based option for engine options
JayjeetAtGithub Mar 27, 2020
be01cab
Fixes
JayjeetAtGithub Mar 27, 2020
8d7a473
Change rule to generate job_name
JayjeetAtGithub Mar 27, 2020
58945d4
Changes to using temp dir for job script
JayjeetAtGithub Mar 27, 2020
5dcdaee
Fix output and err directory
JayjeetAtGithub Mar 27, 2020
72362c5
Change script generation dir
JayjeetAtGithub Mar 27, 2020
464dff5
Change script generation dir
JayjeetAtGithub Mar 27, 2020
67d238b
Remove inheritance from runner_host.DockerRunner
JayjeetAtGithub Mar 27, 2020
ee39dee
Decouple docker runner from host docke runner
JayjeetAtGithub Mar 27, 2020
ed6d17d
pep8 issues fixed
JayjeetAtGithub Mar 27, 2020
ac8059b
Fix comment
JayjeetAtGithub Mar 27, 2020
a4af337
Get streaming output
JayjeetAtGithub Mar 27, 2020
848e508
Optimize flag generation
JayjeetAtGithub Mar 29, 2020
cf6c2ec
optimize docker cmd generation
JayjeetAtGithub Mar 29, 2020
bf63f17
host runner fix
JayjeetAtGithub Mar 29, 2020
4574e38
Return ecode from sattach
JayjeetAtGithub Mar 30, 2020
b563916
Fixes
JayjeetAtGithub Mar 30, 2020
cf9a84c
Specify default engine options and resman options to dict
JayjeetAtGithub Mar 30, 2020
aa42254
Enable streaming of output
JayjeetAtGithub Mar 30, 2020
d0ed519
Change stream to log
JayjeetAtGithub Mar 31, 2020
3a60dec
dont prepare output if logging is asked
JayjeetAtGithub Mar 31, 2020
32a54d5
Change log to logging
JayjeetAtGithub Mar 31, 2020
b908a9c
Remove dotmap import
JayjeetAtGithub Mar 31, 2020
e6cd26a
Fix test_config
JayjeetAtGithub Apr 1, 2020
467bd98
Add sh dependency to setup
JayjeetAtGithub Apr 1, 2020
8b75f3c
fix test_parser and test_runner
JayjeetAtGithub Apr 1, 2020
707cef8
Fix slurm runner test
JayjeetAtGithub Apr 1, 2020
178e5e9
Add slurm docker runner tests
JayjeetAtGithub Apr 1, 2020
e708f46
Fix config test
JayjeetAtGithub Apr 1, 2020
9ac3575
Bug Fix
JayjeetAtGithub Apr 1, 2020
90f208f
Fix slurm runner test
JayjeetAtGithub Apr 1, 2020
ef318bd
Add exec_cmd test
JayjeetAtGithub Apr 1, 2020
4ec90a5
PEP8 fixeS
JayjeetAtGithub Apr 1, 2020
18829ed
Add error streaming
JayjeetAtGithub Apr 1, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions ci/run_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,5 @@ echo "###################################"
ci/test/offline
echo "###################################"
ci/test/engine-conf
echo "###################################"
ci/test/slurm
JayjeetAtGithub marked this conversation as resolved.
Show resolved Hide resolved
54 changes: 27 additions & 27 deletions ci/test/engine-conf
Original file line number Diff line number Diff line change
Expand Up @@ -17,58 +17,58 @@ action "run" {
}
EOF

# config file called settings.py in the project root.
cat <<EOF > settings.py
ENGINE = {
"name": "dont use this name",
"image": "abc/xyz",
"hostname": "xYz.local"
}
# config file called settings.yml in the project root.
cat <<EOF > settings.yml
engine:
name: docker
options:
image: abc/xyz
hostname: xYz.local
EOF

popper run --wfile main.workflow --conf settings.py > output
popper run --wfile main.workflow --conf settings.yml > output
grep -Fxq "xYz.local" output

popper run --wfile main.workflow > output
grep -vFxq "xYz.local" output

# config file with different name in the project root.
cat <<EOF > myconf.py
ENGINE = {
"name": "dont use this name",
"image": "abc/xyz",
"hostname": "xYz.local"
}
cat <<EOF > myconf.yml
engine:
name: docker
options:
image: abc/xyz
hostname: xYz.local
EOF

popper run --wfile main.workflow --conf myconf.py > output
popper run --wfile main.workflow --conf myconf.yml > output
grep -Fxq "xYz.local" output

popper run --wfile main.workflow > output
grep -vFxq "xYz.local" output

# config file in different directory than project root.
mkdir -p /tmp/myengineconf/
cat <<EOF > /tmp/myengineconf/mysettings.py
ENGINE = {
"name": "dont use this name",
"image": "abc/xyz",
"hostname": "xYz.local"
}
cat <<EOF > /tmp/myengineconf/mysettings.yml
engine:
name: docker
options:
image: abc/xyz
hostname: xYz.local
EOF

popper run --wfile main.workflow --conf /tmp/myengineconf/mysettings.py > output
popper run --wfile main.workflow --conf /tmp/myengineconf/mysettings.yml > output
grep -Fxq "xYz.local" output

popper run --wfile main.workflow > output
grep -vFxq "xYz.local" output

cat <<EOF > settings
ENGINE = {
"name": "dont use this name",
"image": "abc/xyz",
"hostname": "xYz.local"
}
engine:
name: docker
options:
image: abc/xyz
hostname: xYz.local
JayjeetAtGithub marked this conversation as resolved.
Show resolved Hide resolved
EOF

popper run --wfile main.workflow --conf settings && exit 1
Expand Down
44 changes: 44 additions & 0 deletions ci/test/slurm
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#!/bin/bash
JayjeetAtGithub marked this conversation as resolved.
Show resolved Hide resolved
set -ex
# shellcheck source=./ci/test/common
source ./ci/test/common
init_test_repo
cd "$test_repo_path"

cat <<EOF > wf.yml
steps:
- id: one
uses: docker://alpine
args: echo step one

- id: two
uses: sh
runs: echo step two
EOF

cat <<EOF > settings.yml
engine:
name: docker

resource_manager:
name: slurm
EOF

cat <<EOF > test.sh
#!/bin/bash
set -ex

export LANG=en_US.utf8

popper run -f wf.yml -c settings.yml > output
grep -q "step one" output
grep -q "step two" output
echo "TEST SLURM PASSED"
EOF

chmod +x test.sh

# run the tests in the cluster
docker pull popperized/docker-slurm-cluster:latest
docker run -h ernie --workdir /src -v /var/run/docker.sock:/var/run/docker.sock -v $PWD:/src popperized/docker-slurm-cluster:latest /src/test.sh

2 changes: 1 addition & 1 deletion cli/popper/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -623,7 +623,7 @@ def complete_graph(self):
if not self.visited.get(tuple(next_set), None):
step['next'] = next_set
for nsa in next_set:
self.steps[nsa]['needs'] = id
self.steps[nsa]['needs'] = [id]
self.visited[tuple(curr_set)] = True

# Finally, generate the root.
Expand Down
26 changes: 16 additions & 10 deletions cli/popper/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,24 +36,29 @@ def __init__(self, config_file=None, workspace_dir=os.getcwd(),
# read options from config file
config_from_file = pu.load_config_file(config_file)

if hasattr(config_from_file, 'ENGINE'):
self.config.engine_options = config_from_file.ENGINE
if hasattr(config_from_file, 'RESOURCE_MANAGER'):
self.config.resman_options = config_from_file.RESOURCE_MANAGER

if not self.config.resman:
self.config.resman = 'host'
# lets consider for now, name property will be present.
if config_from_file.get('engine', None):
self.config.engine_name = config_from_file['engine']['name']
JayjeetAtGithub marked this conversation as resolved.
Show resolved Hide resolved
self.config.engine_options = config_from_file['engine'].get(
'options', None)
if config_from_file.get('resource_manager', None):
self.config.resman_name = config_from_file['resource_manager']['name']
JayjeetAtGithub marked this conversation as resolved.
Show resolved Hide resolved
self.config.resman_options = config_from_file['resource_manager'].get(
'options', None)

if not self.config.resman_name:
JayjeetAtGithub marked this conversation as resolved.
Show resolved Hide resolved
self.config.resman_name = 'host'

# dynamically load resource manager
resman_mod_name = f'popper.runner_{self.config.resman}'
resman_mod_name = f'popper.runner_{self.config.resman_name}'
resman_spec = importlib.util.find_spec(resman_mod_name)
if not resman_spec:
raise ValueError(f'Invalid resource manager: {self.config.resman}')
raise ValueError(
f'Invalid resource manager: {self.config.resman_name}')
self.resman_mod = importlib.import_module(resman_mod_name)

log.debug(f'WorkflowRunner config:\n{pu.prettystr(self.config)}')


def __enter__(self):
return self

Expand Down Expand Up @@ -202,6 +207,7 @@ def step_runner(self, engine_name, step):

class StepRunner(object):
"""Base class for step runners, assumed to be singletons."""

def __init__(self, config):
self.config = config

Expand Down
1 change: 0 additions & 1 deletion cli/popper/runner_host.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
from popper.runner import StepRunner as StepRunner



class HostRunner(StepRunner):
"""Run an step on the Host Machine."""

Expand Down
127 changes: 127 additions & 0 deletions cli/popper/runner_slurm.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
import os
from subprocess import PIPE, Popen, STDOUT, SubprocessError

import docker

from popper import utils as pu
from popper import scm
from popper.cli import log as log
from popper.runner import StepRunner as StepRunner
from popper.runner_host import DockerRunner as HostDockerRunner
from popper.runner_host import HostRunner


class SlurmRunner(StepRunner):

spawned_processes = set()

def __init__(self, config):
super(SlurmRunner, self).__init__(config)

def __exit__(self, exc_type, exc, traceback):
SlurmRunner.spawned_processes = set()

def exec_srun_cmd(self, cmd):
JayjeetAtGithub marked this conversation as resolved.
Show resolved Hide resolved
try:
cmd.insert(0, 'srun')
with Popen(cmd, stdout=PIPE, stderr=STDOUT,
universal_newlines=True, preexec_fn=os.setsid,
cwd=self.config.workspace_dir) as p:
SlurmRunner.spawned_processes.add(p)

log.debug('Reading process output')

for line in iter(p.stdout.readline, ''):
line_decoded = pu.decode(line)
log.step_info(line_decoded[:-1])

p.wait()
ecode = p.poll()
SlurmRunner.spawned_processes.remove(p)

log.debug(f'Code returned by process: {ecode}')

except SubprocessError as ex:
ecode = ex.returncode
log.step_info(f"Command '{cmd[0]}' failed with: {ex}")
except Exception as ex:
ecode = 1
log.step_info(f"Command raised non-SubprocessError error: {ex}")

return ecode

def stop_srun_cmd(self):
for p in SlurmRunner.spawned_processes:
log.info(f'Stopping proces {p.pid}')
p.kill()


class DockerRunner(SlurmRunner, HostDockerRunner):
JayjeetAtGithub marked this conversation as resolved.
Show resolved Hide resolved
"""Runs steps in docker."""

def __init__(self, config):
super(DockerRunner, self).__init__(config)

def run(self, step):
JayjeetAtGithub marked this conversation as resolved.
Show resolved Hide resolved
"""Execute the given step in docker."""
cid = pu.sanitized_name(step['name'], self.config.wid)

build, img, dockerfile = HostDockerRunner.get_build_info(
step, self.config.workspace_dir, self.config.workspace_sha)

container = HostDockerRunner.find_container(cid)

if container and not self.config.reuse and not self.config.dry_run:
container.remove(force=True)

# build or pull
if build:
HostDockerRunner.docker_build(step, img, dockerfile,
self.config.dry_run)
elif not self.config.skip_pull and not step.get('skip_pull', False):
HostDockerRunner.docker_pull(step, img, self.config.dry_run)

msg = f'{img} {step.get("runs", "")} {step.get("args", "")}'
log.info(f'[{step["name"]}] docker create {msg}')

if not self.config.dry_run:
engine_config = {
"image": img,
"command": step.get('args', None),
"name": cid,
"volumes": [
f'{self.config.workspace_dir}:/workspace',
'/var/run/docker.sock:/var/run/docker.sock'
],
"working_dir": '/workspace',
"environment": SlurmRunner.prepare_environment(step),
"entrypoint": step.get('runs', None),
"detach": True
}

if self.config.engine_options:
HostDockerRunner.update_engine_config(
engine_config, self.config.engine_options)
log.debug(f'Engine configuration: {pu.prettystr(engine_config)}\n')

container = HostDockerRunner.d.containers.create(**engine_config)

log.info(f'[{step["name"]}] srun docker start')

if self.config.dry_run:
return 0

HostDockerRunner.spawned_containers.append(container)
ecode = self.start_container(cid)
return ecode

def start_container(self, cid):
docker_cmd = f"docker start --attach {cid}"
ecode = self.exec_srun_cmd(docker_cmd.split(" "))
return ecode

def stop_running_tasks(self):
for c in HostDockerRunner.spawned_containers:
log.info(f'Stopping container {c.name}')
c.stop()
self.stop_srun_cmd()
26 changes: 5 additions & 21 deletions cli/popper/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,22 +147,6 @@ def write_file(path, content=''):
f.close()


def module_from_file(module_name, file_path):
"""Import a file as a module.

Args:
module_name(str): The name of the module.
file_path(str): The path to the file to be imported.

Returns:
module
"""
spec = importlib.util.spec_from_file_location(module_name, file_path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module


def load_config_file(config_file):
"""Validate and parse the engine configuration file.

Expand All @@ -175,13 +159,13 @@ def load_config_file(config_file):
if not os.path.exists(config_file):
log.fail(f'File {config_file} was not found.')

if not config_file.endswith('.py'):
log.fail('Configuration file must be a python source file.')
if not config_file.endswith('.yml'):
log.fail('Configuration file must be a YAML file.')

module_name = os.path.basename(config_file)[:-3]
module = module_from_file(module_name, config_file)
with open(config_file, 'r') as cf:
data = yaml.load(cf, Loader=yaml.Loader)

return module
return data


def assert_executable_exists(command):
Expand Down
3 changes: 2 additions & 1 deletion cli/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,12 @@
include_package_data=True,
install_requires=[
'dotmap',
'testfixtures',
'python-vagrant',
'GitPython',
'spython',
'click',
'pyhcl',
'pyhcl==0.4.0',
'pyyaml',
'docker'
],
Expand Down
Loading