Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slurm tweaks #802

Merged
merged 66 commits into from
Apr 15, 2020
Merged
Show file tree
Hide file tree
Changes from 49 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
191c8f9
Shows stacktrace only when --debug is given
ivotron Mar 18, 2020
da03369
WIP kubernetes runner
ivotron Mar 18, 2020
a666826
Minor tweaks to resource manager dynamic loader
ivotron Mar 18, 2020
cb61940
wip
ivotron Mar 30, 2020
dae7fed
wip - volume creation and build/push
ivotron Mar 30, 2020
42700ca
Renames install-deps script
ivotron Apr 1, 2020
b0c1c00
Adds tests for kubernetes
ivotron Apr 1, 2020
7b106ec
tweaks to config.py module and tests
ivotron Apr 6, 2020
bed6d1f
changes to runner
ivotron Apr 6, 2020
da44088
Removes dotmap
ivotron Apr 6, 2020
4935bf4
Changes to runner_host and tests
ivotron Apr 6, 2020
9aa3ba0
WIP
ivotron Apr 6, 2020
da5a240
wip
ivotron Apr 8, 2020
48bf3c7
wip
ivotron Apr 8, 2020
7ecef02
wippy
ivotron Apr 8, 2020
5ab47e3
wip tweak slurm runner
ivotron Apr 8, 2020
e802abf
Fixes passing arguments up the class hierarchy
ivotron Apr 8, 2020
68cc78b
brings back setup_base_cache tests
ivotron Apr 8, 2020
66d2a13
revert change
ivotron Apr 8, 2020
1e6ce6c
change _spawned_containers list to set
JayjeetAtGithub Apr 8, 2020
dd66f42
Fix wrong name
JayjeetAtGithub Apr 8, 2020
3327e0b
Bugfix
JayjeetAtGithub Apr 8, 2020
54a2e99
Bugfix
JayjeetAtGithub Apr 8, 2020
22e5038
Bugfix
JayjeetAtGithub Apr 9, 2020
3b2ef89
Bugfix
JayjeetAtGithub Apr 9, 2020
fda1b54
Bugfix
JayjeetAtGithub Apr 9, 2020
6335179
Fixes
JayjeetAtGithub Apr 9, 2020
7e70eaf
Fixes
JayjeetAtGithub Apr 9, 2020
739e1fa
Completes test_runner_host and fixes
JayjeetAtGithub Apr 9, 2020
56a62c6
Fix permissions on install_scripts.sh
JayjeetAtGithub Apr 9, 2020
34cf97a
Add psutil dependency to setup.py
JayjeetAtGithub Apr 9, 2020
1e9bf16
Change repository name to lower case
JayjeetAtGithub Apr 9, 2020
5d00e5b
Fix config parsing
JayjeetAtGithub Apr 10, 2020
039f04e
Add pu.decode back temporarily
JayjeetAtGithub Apr 10, 2020
8c41dc5
Fix stop_running_tasks in DockerRunner
JayjeetAtGithub Apr 10, 2020
bb453cb
Complete tests
JayjeetAtGithub Apr 10, 2020
0780272
Add test_run() to SlurmDockerRunnerTests
JayjeetAtGithub Apr 10, 2020
90ff39c
Fix test_run()
JayjeetAtGithub Apr 10, 2020
22c7408
Seperate test dependencies
JayjeetAtGithub Apr 11, 2020
d2b654f
Complete test_runner_slurm.py
JayjeetAtGithub Apr 11, 2020
021ab04
Removes TODO
JayjeetAtGithub Apr 11, 2020
2ba4564
Change to using rstrip()
JayjeetAtGithub Apr 11, 2020
f50f9da
Check thread status
JayjeetAtGithub Apr 11, 2020
eaec5a2
Fix tests
JayjeetAtGithub Apr 11, 2020
6368021
Add tail process status check
JayjeetAtGithub Apr 11, 2020
61d573c
Add delay after thread start and use default decode
JayjeetAtGithub Apr 11, 2020
ace8d4d
PEP8 issue fixes
JayjeetAtGithub Apr 11, 2020
91bf0e3
Fix test
JayjeetAtGithub Apr 11, 2020
afed13b
Indentation fixes
JayjeetAtGithub Apr 12, 2020
2b8f184
Test calls in MockPopen
JayjeetAtGithub Apr 14, 2020
fbdd346
Add precedence information in cmd_run docstring
JayjeetAtGithub Apr 14, 2020
93a083f
Change info statements
JayjeetAtGithub Apr 14, 2020
4ea812c
Changes in host runner
JayjeetAtGithub Apr 14, 2020
cae4ac4
Tests updated
JayjeetAtGithub Apr 14, 2020
e2ef512
Fix the failing test_submit_job_failure
JayjeetAtGithub Apr 14, 2020
5eb8a95
test_runner_slurm updates
JayjeetAtGithub Apr 14, 2020
d2b434b
Use context managers in test_runner_host.py
JayjeetAtGithub Apr 14, 2020
bfd842b
Use context managers in test_runner_slurm
JayjeetAtGithub Apr 14, 2020
61e5de3
Add instructions for running unittests locally
JayjeetAtGithub Apr 15, 2020
6ed6a8e
Check logs from exec_cmd
JayjeetAtGithub Apr 15, 2020
efff84b
Add section on resource manager in docs
JayjeetAtGithub Apr 15, 2020
d938530
Change levels in CONTRIBUTING.md
JayjeetAtGithub Apr 15, 2020
cfdbe90
Fix the description in cmd_run.py
JayjeetAtGithub Apr 15, 2020
093fa46
Show container args in docker create step info
JayjeetAtGithub Apr 15, 2020
f5f974e
PEP8 issues addressed
JayjeetAtGithub Apr 15, 2020
aac79fa
Remove psutil dependency and fix resource leak
JayjeetAtGithub Apr 15, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ before_install:
- ci/scripts/install_scripts.sh
- pip install coverage
install:
- pip install cli/
- pip install cli/[dev]
script:
- coverage run -m unittest discover --start-directory cli/test
- ci/run_tests.sh
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ popper run -f wf.yml
```

Keep reading down to find [installation instructions](#installation).
For more information on the YAML syntax, see [here][cnwf].
The full example above can be found [here][minimalpython]. For more
information on the YAML syntax, see [here][cnwf].

The high-level goals of this project are to provide:

Expand Down Expand Up @@ -114,3 +115,4 @@ us](mailto:ivo@cs.ucsc.edu).
[cn]: https://cloudblogs.microsoft.com/opensource/2018/04/23/5-reasons-you-should-be-doing-container-native-development/
[compose]: https://docs.docker.com/compose/
[podman]: https://podman.io
[minimalpython]: https://github.com/popperized/popper-examples/tree/master/workflows/minimal-python
25 changes: 14 additions & 11 deletions cli/popper/commands/cmd_run.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
import click
import os
import traceback

from popper import log as logging
from popper.cli import log, pass_context
from popper.config import PopperConfig
from popper.parser import Workflow
from popper.runner import WorkflowRunner

Expand Down Expand Up @@ -146,14 +148,15 @@ def cli(ctx, step, wfile, debug, dry_run, log_file, quiet, reuse,
substitutions=substitution, allow_loose=allow_loose,
include_step_dependencies=with_dependencies)

# instantiate the runner
runner = WorkflowRunner(
engine,
resource_manager,
config_file=conf,
dry_run=dry_run,
reuse=reuse,
skip_pull=skip_pull,
skip_clone=skip_clone,
workspace_dir=workspace)
runner.run(wf)
config = PopperConfig(engine_name=engine, resman_name=resource_manager,
config_file=conf, reuse=reuse, dry_run=dry_run,
skip_pull=skip_pull, skip_clone=skip_clone,
workspace_dir=workspace)

runner = WorkflowRunner(config)

try:
runner.run(wf)
except Exception as e:
log.debug(traceback.format_exc())
log.fail(e)
141 changes: 91 additions & 50 deletions cli/popper/config.py
Original file line number Diff line number Diff line change
@@ -1,59 +1,100 @@
import os
import yaml

from hashlib import shake_256
import popper.scm as scm

from hashlib import shake_256
from popper.cli import log as log

import popper.scm as scm
import popper.utils as pu


class PopperConfig(object):
def __init__(self, **kwargs):
def __init__(self, engine_name=None, resman_name=None, config_file=None,
workspace_dir=os.getcwd(), reuse=False,
dry_run=False, quiet=False, skip_pull=False,
skip_clone=False):

self.workspace_dir = os.path.realpath(workspace_dir)
self.reuse = reuse
self.dry_run = dry_run
self.quiet = quiet
self.skip_pull = skip_pull
self.skip_clone = skip_clone
self.repo = scm.new_repo()
self.workspace_dir = os.path.realpath(kwargs['workspace_dir'])
self.wid = shake_256(self.workspace_dir.encode('utf-8')).hexdigest(4)
self.workspace_sha = scm.get_sha(self.repo)
self.config_file = kwargs['config_file']
self.dry_run = kwargs['dry_run']
self.skip_clone = kwargs['skip_clone']
self.skip_pull = kwargs['skip_pull']
self.quiet = kwargs['quiet']
self.reuse = kwargs['reuse']
self.engine_name = kwargs.get('engine', None)
self.resman_name = kwargs.get('resource_manager', None)
self.engine_options = kwargs['engine_options']
self.resman_options = kwargs['resman_options']
self.config_from_file = pu.load_config_file(self.config_file)

def parse(self):
self.validate()
self.normalize()

def validate(self):
if self.config_from_file.get('engine', None):
if not self.config_from_file['engine'].get('name', None):
log.fail(
'engine config must have the name property.')

if self.config_from_file.get('resource_manager', None):
if not self.config_from_file['resource_manager'].get('name', None):
log.fail(
'resource_manager config must have the name property.')

def normalize(self):
if not self.engine_name:
if self.config_from_file.get('engine', None):
self.engine_name = self.config_from_file['engine']['name']
self.engine_options = self.config_from_file['engine'].get(
'options', dict())
else:
self.engine_name = 'docker'

if not self.resman_name:
if self.config_from_file.get('resource_manager', None):
self.resman_name = self.config_from_file['resource_manager']['name']
self.resman_options = self.config_from_file['resource_manager'].get(
'options', dict())
else:
self.resman_name = 'host'

wid = shake_256(self.workspace_dir.encode('utf-8')).hexdigest(4)
self.wid = wid

from_file = self._load_config_from_file(config_file, engine_name,
resman_name)

self.engine_name = from_file['engine_name']
self.resman_name = from_file['resman_name']
self.engine_opts = from_file['engine_opts']
self.resman_opts = from_file['resman_opts']

def _load_config_from_file(self, config_file, engine_name, resman_name):
JayjeetAtGithub marked this conversation as resolved.
Show resolved Hide resolved
from_file = PopperConfig.__load_config_file(config_file)
loaded_conf = {}

eng_section = from_file.get('engine', None)
eng_from_file = from_file.get('engine', {}).get('name')
if from_file and eng_section and not eng_from_file:
log.fail('No engine name given.')

resman_section = from_file.get('resource_manager', None)
resman_from_file = from_file.get('resource_manager', {}).get('name')
if from_file and resman_section and not resman_from_file:
log.fail('No resource manager name given.')

# set name in precedence order (or assigne default values)
if engine_name:
loaded_conf['engine_name'] = engine_name
elif eng_from_file:
loaded_conf['engine_name'] = eng_from_file
else:
loaded_conf['engine_name'] = 'docker'

if resman_name:
loaded_conf['resman_name'] = resman_name
elif resman_from_file:
loaded_conf['resman_name'] = resman_from_file
else:
loaded_conf['resman_name'] = 'host'

engine_opts = from_file.get('engine', {}).get('options', {})
resman_opts = from_file.get('resource_manager', {}).get('options', {})
loaded_conf['engine_opts'] = engine_opts
loaded_conf['resman_opts'] = resman_opts

return loaded_conf

@staticmethod
def __load_config_file(config_file):
"""Validate and parse the engine configuration file.

Args:
config_file(str): Path to the file to be parsed.

Returns:
dict: Engine configuration.
"""
if isinstance(config_file, dict):
return config_file

if not config_file:
return dict()

if not os.path.exists(config_file):
log.fail(f'File {config_file} was not found.')

if not config_file.endswith('.yml'):
log.fail('Configuration file must be a YAML file.')

with open(config_file, 'r') as cf:
data = yaml.load(cf, Loader=yaml.Loader)

if not data:
log.fail('Configuration file is empty.')

return data
43 changes: 41 additions & 2 deletions cli/popper/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@
import re
import hcl
import os
import threading
import yaml

from copy import deepcopy
from builtins import str, dict
from popper.cli import log as log

import popper.scm as scm
import popper.utils as pu


Expand All @@ -24,6 +24,45 @@
"next"]


class threadsafe_iter_3:
"""Takes an iterator/generator and makes it thread-safe by serializing call
to the `next` method of given iterator/generator."""

def __init__(self, it):
self.it = it
self.lock = threading.Lock()

def __iter__(self):
return self

def __next__(self):
with self.lock:
return self.it.__next__()


def threadsafe_generator(f):
"""A decorator that takes a generator function and makes it thread-safe.

Args:
f(function): Generator function

Returns:
None
"""
def g(*args, **kwargs):
"""

Args:
*args(list): List of non-key worded,variable length arguments.
**kwargs(dict): List of key-worded,variable length arguments.

Returns:
function: The thread-safe function.
"""
return threadsafe_iter_3(f(*args, **kwargs))
return g


class Workflow(object):
"""Represents an immutable workflow."""

Expand Down Expand Up @@ -98,7 +137,7 @@ def format_command(params):
return params.split(" ")
return params

@pu.threadsafe_generator
@threadsafe_generator
def get_stages(self):
"""Generator of stages. A stages is a list of steps that can be
executed in parallel.
Expand Down
Loading