Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better snakemake cluster submission support #129

Merged
merged 38 commits into from
Sep 7, 2023
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
22fb1e6
add resources or localrules for each rule
AroneyS Aug 29, 2023
925260c
add multiplier for resources with retry
AroneyS Aug 30, 2023
fc9f55c
add threads to match mem_mb
AroneyS Aug 30, 2023
4e95ad1
run test_integration tests in permanent dir
AroneyS Aug 30, 2023
7dd653e
add logs to each rule
AroneyS Aug 30, 2023
62c0e19
reduce refinery times
AroneyS Aug 30, 2023
8ef6bd1
add rerun triggers to run_workflow
AroneyS Aug 30, 2023
704e652
add rerun_triggers to run_workflow in main
AroneyS Aug 30, 2023
d7d6756
fix rerun_triggers default
AroneyS Aug 30, 2023
69fec3b
add --snakemake-profile and --cluster-retries arguments
AroneyS Aug 30, 2023
a0a3d72
fix logs for "script" rules
AroneyS Aug 30, 2023
076c2fd
add \n to logf.write cmds
AroneyS Aug 30, 2023
6b0e5a2
missed some
AroneyS Aug 30, 2023
2d88935
fix profile check when running snakemake
AroneyS Aug 30, 2023
d4ba3f4
account for occasional long-term semibin2 runs
AroneyS Aug 30, 2023
e240d0a
fix tests
AroneyS Aug 30, 2023
80d4efe
fix programs which log to stdout
AroneyS Aug 30, 2023
ac61051
improve portability
AroneyS Aug 30, 2023
eb18a06
echo CheckM2 database to log
AroneyS Aug 30, 2023
08c8758
add keep-going to snakemake cmd
AroneyS Aug 31, 2023
c7df052
remove groups
AroneyS Aug 31, 2023
58817fb
add resources to qc.smk
AroneyS Aug 31, 2023
242667a
test queue submission across assembly+recovery
AroneyS Aug 31, 2023
ab5747e
fix test
AroneyS Aug 31, 2023
5123ade
consolidate integration tests
AroneyS Aug 31, 2023
0cb1c9f
fix test name
AroneyS Aug 31, 2023
8f35643
remove tmpdir from integration tests
AroneyS Aug 31, 2023
919a96c
add log files to qc rules
AroneyS Sep 1, 2023
a1ad666
remove extra quotes
AroneyS Sep 1, 2023
233f219
add log files to assembly
AroneyS Sep 1, 2023
180c1ac
fix some logging errors
AroneyS Sep 1, 2023
fcbf579
fix another print
AroneyS Sep 1, 2023
823e0b6
add refinery logging when skipping
AroneyS Sep 3, 2023
252755f
move refinery logging to log also for refine_dastool
AroneyS Sep 4, 2023
66c0127
re-add logging that otherwise missed
AroneyS Sep 4, 2023
c11b501
add example snakemake cluster to docs
AroneyS Sep 5, 2023
99e6bec
rename cluster submission section
AroneyS Sep 5, 2023
0a2d7e0
add memory increase/cap information to RAM control section
AroneyS Sep 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 22 additions & 2 deletions aviary/aviary.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,23 @@ def main():
default=" "
)

base_group.add_argument(
'--snakemake-profile',
help='Snakemake profile (see https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles)\n'
'Create profile as `~/.config/snakemake/[CLUSTER_PROFILE]/config.yaml`. \n'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear whether you mean the user or aviary does the creating. I think an example yml in the doco would go a long way here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some stuff under Advanced Usage. Is that what you wanted?

'Can be used to submit rules as jobs to cluster engine (see https://snakemake.readthedocs.io/en/stable/executing/cluster.html), \n'
'requires cluster, cluster-status, jobs, cluster-cancel. ',
dest='snakemake_profile',
default=""
)

base_group.add_argument(
'--cluster-retries',
help='Number of times to retry a failed job when using cluster submission (see `--snakemake-profile`). ',
dest='cluster_retries',
default=0
)

base_group.add_argument(
'--dry-run', '--dry_run', '--dryrun',
help='Perform snakemake dry run, tests workflow order and conda environments',
Expand Down Expand Up @@ -250,7 +267,7 @@ def main():
'--rerun-triggers', '--rerun_triggers',
help='Specify which kinds of modifications will trigger rules to rerun',\
dest='rerun_triggers',
default="mtime",
default=["mtime"],
nargs="*",
choices=["mtime","params","input","software-env","code"]
)
Expand Down Expand Up @@ -1182,7 +1199,10 @@ def main():
dryrun=args.dryrun,
clean=args.clean,
conda_frontend=args.conda_frontend,
snakemake_args=args.cmds)
snakemake_args=args.cmds,
rerun_triggers=args.rerun_triggers,
profile=args.snakemake_profile,
cluster_retries=args.cluster_retries)
else:
process_batch(args, prefix)

Expand Down
38 changes: 26 additions & 12 deletions aviary/modules/annotation/annotation.smk
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
localrules: download_databases, download_eggnog_db, download_gtdb, download_checkm2, annotate

onstart:
import os
Expand Down Expand Up @@ -133,15 +134,21 @@ rule checkm2:
mag_extension = config['mag_extension'],
checkm2_db_path = config["checkm2_db_folder"]
threads:
config["max_threads"]
min(config["max_threads"], 16)
resources:
mem_mb = lambda wildcards, attempt: min(int(config["max_memory"])*1024, 128*1024*attempt),
runtime = lambda wildcards, attempt: 8*60*attempt,
log:
'logs/checkm2.log'
benchmark:
'benchmarks/checkm2.benchmark.txt'
conda:
"../../envs/checkm2.yaml"
shell:
'export CHECKM2DB={params.checkm2_db_path}/uniref100.KO.1.dmnd; '
'echo "Using CheckM2 database $CHECKM2DB"; '
'echo "Using CheckM2 database $CHECKM2DB" > {log}; '
'checkm2 predict -i {input.mag_folder}/ -x {params.mag_extension} -o {output.checkm2_folder} -t {threads} --force'
'>> {log} 2>&1 '

rule eggnog:
input:
Expand All @@ -151,13 +158,15 @@ rule eggnog:
mag_extension = config['mag_extension'],
eggnog_db = config['eggnog_folder'],
tmpdir = config["tmpdir"]
resources:
mem_mb=int(config["max_memory"])*512
group: 'annotation'
output:
done = 'data/eggnog/done'
threads:
config['max_threads']
min(config["max_threads"], 64)
resources:
mem_mb = lambda wildcards, attempt: min(int(config["max_memory"])*1024, 512*1024*attempt),
runtime = lambda wildcards, attempt: 24*60*attempt,
log:
'logs/eggnog.log'
benchmark:
'benchmarks/eggnog.benchmark.txt'
conda:
Expand All @@ -167,31 +176,36 @@ rule eggnog:
'mkdir -p data/eggnog/; '
'find {input.mag_folder}/*.{params.mag_extension} | parallel -j1 \'emapper.py --data_dir {params.eggnog_db} '
'--dmnd_db {params.eggnog_db}/*dmnd --cpu {threads} -m diamond --itype genome --genepred prodigal -i {{}} '
'--output_dir data/eggnog/ --temp_dir {params.tmpdir} -o {{/.}} || echo "Genome already annotated"\'; '
'--output_dir data/eggnog/ --temp_dir {params.tmpdir} -o {{/.}} || echo "Genome already annotated"\' '
'> {log} 2>&1; '
'touch data/eggnog/done; '

rule gtdbtk:
input:
mag_folder = config['mag_directory']
group: 'annotation'
output:
done = "data/gtdbtk/done"
params:
gtdbtk_folder = config['gtdbtk_folder'],
pplacer_threads = config["pplacer_threads"],
extension = config['mag_extension']
resources:
mem_mb=int(config["max_memory"])*1024
conda:
"../../envs/gtdbtk.yaml"
threads:
config["max_threads"]
min(config["max_threads"], 32)
resources:
mem_mb = lambda wildcards, attempt: min(int(config["max_memory"])*1024, 256*1024*attempt),
runtime = lambda wildcards, attempt: 12*60*attempt,
log:
'logs/gtdbtk.log'
benchmark:
'benchmarks/gtdbtk.benchmark.txt'
shell:
"export GTDBTK_DATA_PATH={params.gtdbtk_folder} && "
"gtdbtk classify_wf --skip_ani_screen --cpus {threads} --pplacer_cpus {params.pplacer_threads} --extension {params.extension} "
"--genome_dir {input.mag_folder} --out_dir data/gtdbtk && touch data/gtdbtk/done"
"--genome_dir {input.mag_folder} --out_dir data/gtdbtk "
"> {log} 2>&1 "
"&& touch data/gtdbtk/done"

rule annotate:
input:
Expand Down
Loading