Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow use of external pipelines with Control.py #390

Merged
merged 26 commits into from
Jan 29, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
062f8e5
made Control.py more flexible, moved report memory to ini file
antoniojbt Jan 4, 2018
8ae8797
Update Control.py
antoniojbt Jan 4, 2018
267ed99
Update Control.py
antoniojbt Jan 5, 2018
0f0c9e2
tests for control.py
antoniojbt Jan 9, 2018
986fee1
Update pipeline.ini
antoniojbt Jan 9, 2018
96b7756
updated cluster.py for pbspro, already in other branch though
antoniojbt Jan 9, 2018
4d02c02
Merge branch 'AJBT-pipeline-control' of https://github.com/CGATOxford…
antoniojbt Jan 9, 2018
1ef89ca
updates/testing
antoniojbt Jan 9, 2018
ef798d1
control.py changes
antoniojbt Jan 9, 2018
50173b2
added function to search and import external pipeline, untested
antoniojbt Jan 17, 2018
bf20e75
docstrings for cgatflow external pipeline import function
antoniojbt Jan 17, 2018
74895c8
updates/testing
antoniojbt Jan 17, 2018
a6bbede
Merge branch 'AJBT-pipeline-control' of https://github.com/CGATOxford…
antoniojbt Jan 17, 2018
d711895
added options for calling external pipelines to cgatflow
antoniojbt Jan 17, 2018
1aee135
updates/testing
antoniojbt Jan 17, 2018
db7f8a5
updates/testing
antoniojbt Jan 17, 2018
a3f1497
returned cgatflow to original code, no changes, easier to call extern…
antoniojbt Jan 22, 2018
21a4172
updates/testing
antoniojbt Jan 22, 2018
cc75357
control.py
antoniojbt Jan 23, 2018
a41c6fc
control.py for external pipelines
antoniojbt Jan 23, 2018
01fd59e
changes to cluster.py for pbspro
antoniojbt Jan 23, 2018
8e91f82
Merge branch 'master' into AJBT-pipeline-control
antoniojbt Jan 24, 2018
0e760fe
reverting Control.py
antoniojbt Jan 29, 2018
fee4712
Merge branch 'AJBT-pipeline-control' of https://github.com/CGATOxford…
antoniojbt Jan 29, 2018
3658199
Revert Control.py to master version
sebastian-luna-valero Jan 29, 2018
5633731
Merge branch 'AJBT-pipeline-control' of github.com:CGATOxford/CGATPip…
sebastian-luna-valero Jan 29, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 30 additions & 34 deletions CGATPipelines/Pipeline/Cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,51 +166,47 @@ def setupDrmaaJobTemplate(drmaa_session, options, job_name, job_memory):

# PBSPro only takes the first 15 characters, throws uninformative error if longer.
# mem is maximum amount of RAM used by job; mem_free doesn't seem to be available.
# For qsub job requirements would be passed as e.g.
#PBS -lselect=N:ncpus=X:mem=Ygb
#PBS -lwalltime=HH:00:00
# 'select=1' determines de number of nodes. Should go in a config file.
# mem is per node and maximum memory
# Site dependent but in general setting '#PBS -l select=NN:ncpus=NN:mem=NN{gb|mb}'
# is sufficient for parallel jobs (OpenMP, MPI).
# Also architecture dependent, jobs could be hanging if resource doesn't exist.
# TO DO: Kill if long waiting time?
nodes = 1 # TO DO: hard coding as unsure of definitions between
# threads, nodes, etc. between programmes for now

# Set up basic requirements for job submission:
# if process has multiple threads, use a parallel environment:
# TO DO: error in fastqc build_report, var referenced before assignment.
# For now adding to workaround:
if 'job_threads' in options:
job_threads = options["job_threads"]
else:
job_threads = 1

spec = ["-N %s" % job_name[0:15],
"-l mem=%s" % job_memory]
"-l select=%s:ncpus=%s:mem=%s" % (nodes, job_threads, job_memory)]

# Leaving walltime to be specified by user as difficult to set dynamically and
# depends on site/admin configuration of default values. Likely means setting for
# longest job with trade-off of longer waiting times for resources to be
# available for other jobs.
if options["cluster_options"]:
if "mem" not in options["cluster_options"]:
spec.append("%(cluster_options)s")
elif "mem" in options["cluster_options"]:
conds = ('mem' in options["cluster_options"],
'ncpus' in options["cluster_options"],
'select' in options["cluster_options"]
)
if any(conds):
spec = ["-N %s" % job_name[0:15]]
spec.append("%(cluster_options)s")

# if process has multiple threads, use a parallel environment:
# TO DO: error in fastqc build_report, var referenced before assignment.
# For now adding to workaround:
if 'job_threads' in options:
job_threads = options["job_threads"]
else:
job_threads = 1

multithread = 'job_threads' in options and options['job_threads'] > 1
if multithread:
# TO DO 'select=1' determines de number of nodes. Should go in a config file.
# mem is per node and maximum memory
# Site dependent but in general setting '#PBS -l select=NN:ncpus=NN:mem=NN{gb|mb}'
# is sufficient for parallel jobs (OpenMP, MPI).
# Also architecture dependent, jobs could be hanging if resource doesn't exist.
# TO DO: Kill if long waiting time?
spec = ["-N %s" % job_name[0:15],
"-l select=1:ncpus=%s:mem=%s" % (job_threads, job_memory)]

if options["cluster_options"]:
if "mem" not in options["cluster_options"]:
spec.append("%(cluster_options)s")

elif "mem" in options["cluster_options"]:
raise ValueError('''mem resource specified twice, check ~/.cgat config file,
ini files, command line options, etc.
''')
else:
spec.append("%(cluster_options)s")

if "cluster_pe_queue" in options and multithread:
spec.append(
"-q %(cluster_pe_queue)s")
spec.append("-q %(cluster_pe_queue)s")
elif options['cluster_queue'] != "NONE":
spec.append("-q %(cluster_queue)s")
# TO DO: sort out in Parameters.py to allow none values for configparser:
Expand Down
4 changes: 3 additions & 1 deletion CGATPipelines/Pipeline/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,9 @@ def run_report(clean=True,

# warning: memory gets multiplied by threads, so set it not too
# high
job_memory = "1G"
job_memory = PARAMS["report_memory"]
#"1G" # This causes problems in outside HPCs

job_threads = PARAMS["report_threads"]

# use a fake X display in order to avoid windows popping up
Expand Down
12 changes: 8 additions & 4 deletions CGATPipelines/configuration/pipeline.ini
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@
########################################################
########################################################
# The project name to appear in the report
projectname=CGATProject
projectname=to-set

# The copyright statement to appear in the report
copyright=CGAT (2010-2014)
copyright=

# The short X.Y version to appear in the report
version=0.1
Expand Down Expand Up @@ -37,7 +37,8 @@ scratchdir=/tmp
web_dir=../web

# location of indexed genome
genome_dir=/ifs/mirror/genomes/plain
#genome_dir=/ifs/mirror/genomes/plain
genome_dir=to-set

# The genome to use (UCSC convention)
genome=hg19
Expand Down Expand Up @@ -75,7 +76,8 @@ port=3306
[cluster]

# queue to use
queue=all.q
#queue=all.q
queue=

# priority of jobs on cluster
priority=-10
Expand All @@ -89,6 +91,8 @@ priority=-10
# number of threads to use to build the documentation
threads=10

memory=1G

# directory for html documentation
html=report/html

Expand Down