Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better snakemake cluster submission support #129

Merged
merged 38 commits into from
Sep 7, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
22fb1e6
add resources or localrules for each rule
AroneyS Aug 29, 2023
925260c
add multiplier for resources with retry
AroneyS Aug 30, 2023
fc9f55c
add threads to match mem_mb
AroneyS Aug 30, 2023
4e95ad1
run test_integration tests in permanent dir
AroneyS Aug 30, 2023
7dd653e
add logs to each rule
AroneyS Aug 30, 2023
62c0e19
reduce refinery times
AroneyS Aug 30, 2023
8ef6bd1
add rerun triggers to run_workflow
AroneyS Aug 30, 2023
704e652
add rerun_triggers to run_workflow in main
AroneyS Aug 30, 2023
d7d6756
fix rerun_triggers default
AroneyS Aug 30, 2023
69fec3b
add --snakemake-profile and --cluster-retries arguments
AroneyS Aug 30, 2023
a0a3d72
fix logs for "script" rules
AroneyS Aug 30, 2023
076c2fd
add \n to logf.write cmds
AroneyS Aug 30, 2023
6b0e5a2
missed some
AroneyS Aug 30, 2023
2d88935
fix profile check when running snakemake
AroneyS Aug 30, 2023
d4ba3f4
account for occasional long-term semibin2 runs
AroneyS Aug 30, 2023
e240d0a
fix tests
AroneyS Aug 30, 2023
80d4efe
fix programs which log to stdout
AroneyS Aug 30, 2023
ac61051
improve portability
AroneyS Aug 30, 2023
eb18a06
echo CheckM2 database to log
AroneyS Aug 30, 2023
08c8758
add keep-going to snakemake cmd
AroneyS Aug 31, 2023
c7df052
remove groups
AroneyS Aug 31, 2023
58817fb
add resources to qc.smk
AroneyS Aug 31, 2023
242667a
test queue submission across assembly+recovery
AroneyS Aug 31, 2023
ab5747e
fix test
AroneyS Aug 31, 2023
5123ade
consolidate integration tests
AroneyS Aug 31, 2023
0cb1c9f
fix test name
AroneyS Aug 31, 2023
8f35643
remove tmpdir from integration tests
AroneyS Aug 31, 2023
919a96c
add log files to qc rules
AroneyS Sep 1, 2023
a1ad666
remove extra quotes
AroneyS Sep 1, 2023
233f219
add log files to assembly
AroneyS Sep 1, 2023
180c1ac
fix some logging errors
AroneyS Sep 1, 2023
fcbf579
fix another print
AroneyS Sep 1, 2023
823e0b6
add refinery logging when skipping
AroneyS Sep 3, 2023
252755f
move refinery logging to log also for refine_dastool
AroneyS Sep 4, 2023
66c0127
re-add logging that otherwise missed
AroneyS Sep 4, 2023
c11b501
add example snakemake cluster to docs
AroneyS Sep 5, 2023
99e6bec
rename cluster submission section
AroneyS Sep 5, 2023
0a2d7e0
add memory increase/cap information to RAM control section
AroneyS Sep 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions aviary/modules/annotation/annotation.smk
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
localrules: download_databases, download_eggnog_db, download_gtdb, download_checkm2, annotate

onstart:
import os
Expand Down Expand Up @@ -134,6 +135,9 @@ rule checkm2:
checkm2_db_path = config["checkm2_db_folder"]
threads:
config["max_threads"]
resources:
mem_mb=int(config["max_memory"])*128,
runtime = "8h",
benchmark:
'benchmarks/checkm2.benchmark.txt'
conda:
Expand All @@ -152,7 +156,8 @@ rule eggnog:
eggnog_db = config['eggnog_folder'],
tmpdir = config["tmpdir"]
resources:
mem_mb=int(config["max_memory"])*512
mem_mb=int(config["max_memory"])*512,
runtime = "24h",
group: 'annotation'
output:
done = 'data/eggnog/done'
Expand Down Expand Up @@ -181,7 +186,8 @@ rule gtdbtk:
pplacer_threads = config["pplacer_threads"],
extension = config['mag_extension']
resources:
mem_mb=int(config["max_memory"])*1024
mem_mb=int(config["max_memory"])*1024,
runtime = "12h",
conda:
"../../envs/gtdbtk.yaml"
threads:
Expand Down
56 changes: 41 additions & 15 deletions aviary/modules/assembly/assembly.smk
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
localrules: get_umapped_reads_ref, no_ref_filter, get_high_cov_contigs, skip_long_assembly, short_only, move_spades_assembly, pool_reads, skip_unicycler, skip_unicycler_with_qc, complete_assembly, complete_assembly_with_qc, reset_to_spades_assembly, remove_final_contigs, combine_assemblies, combine_long_only

ruleorder: filter_illumina_assembly > short_only
# ruleorder: fastqc > fastqc_long
ruleorder: skip_unicycler_with_qc > skip_unicycler > combine_assemblies > combine_long_only
Expand Down Expand Up @@ -58,6 +60,9 @@ rule map_reads_ref:
mapper = "map-ont" if config["long_read_type"] in ["ont", "ont-hq"] else "map-pb"
threads:
config["max_threads"]
resources:
mem_mb=int(config["max_memory"])*512,
runtime = "12h",
shell:
"minimap2 -ax {params.mapper} --split-prefix=tmp -t {threads} {input.reference_filter} {input.fastq} | samtools view -@ {threads} -b > {output} && samtools index {output}"

Expand Down Expand Up @@ -89,6 +94,9 @@ rule get_reads_list_ref:
temp("data/long_reads.fastq.gz")
threads:
config['max_threads']
resources:
mem_mb=int(config["max_memory"])*512,
runtime = "12h",
conda:
"envs/seqtk.yaml"
benchmark:
Expand Down Expand Up @@ -121,7 +129,8 @@ rule flye_assembly:
params:
long_read_type = config["long_read_type"]
resources:
mem_mb=int(config["max_memory"])*1024
mem_mb=int(config["max_memory"])*1024,
runtime = "48h",
conda:
"envs/flye.yaml"
benchmark:
Expand All @@ -148,7 +157,8 @@ rule polish_metagenome_flye:
illumina = False,
coassemble = config["coassemble"]
resources:
mem_mb=int(config["max_memory"])*1024
mem_mb=int(config["max_memory"])*1024,
runtime = "24h",
group: 'assembly'
output:
fasta = "data/assembly.pol.rac.fasta"
Expand All @@ -173,6 +183,9 @@ rule filter_illumina_ref:
"../../envs/minimap2.yaml"
threads:
config["max_threads"]
resources:
mem_mb=int(config["max_memory"])*512,
runtime = "8h",
benchmark:
"benchmarks/filter_illumina_ref.benchmark.txt"
script:
Expand All @@ -194,7 +207,8 @@ rule generate_pilon_sort:
threads:
config["max_threads"]
resources:
mem_mb=int(config["max_memory"])*1024
mem_mb=int(config["max_memory"])*1024,
runtime = "24h",
conda:
"envs/pilon.yaml"
benchmark:
Expand All @@ -212,8 +226,9 @@ rule polish_meta_pilon:
output:
fasta = "data/assembly.pol.pil.fasta"
resources:
mem_mb=int(config["max_memory"])*512
# threads: # Threads no longer supported for pilon
mem_mb=int(config["max_memory"])*512,
runtime = "24h",
threads: 1 # Threads no longer supported for pilon
# config["max_threads"]
params:
pilon_memory = int(config["max_memory"])*512
Expand All @@ -237,7 +252,8 @@ rule polish_meta_racon_ill:
fasta = "data/assembly.pol.fin.fasta",
paf = temp("data/polishing/alignment.racon_ill.0.paf")
resources:
mem_mb=int(config["max_memory"])*1024
mem_mb=int(config["max_memory"])*1024,
runtime = "24h",
threads:
config["max_threads"]
conda:
Expand Down Expand Up @@ -379,6 +395,9 @@ rule filter_illumina_assembly:
"../../envs/minimap2.yaml"
threads:
config["max_threads"]
resources:
mem_mb=int(config["max_memory"])*512,
runtime = "24h",
benchmark:
"benchmarks/filter_illumina_assembly.benchmark.txt"
script:
Expand Down Expand Up @@ -428,7 +447,8 @@ rule spades_assembly:
threads:
config["max_threads"]
resources:
mem_mb=int(config["max_memory"])*1024
mem_mb=int(config["max_memory"])*1024,
runtime = "96h",
params:
max_memory = config["max_memory"],
long_read_type = config["long_read_type"],
Expand Down Expand Up @@ -488,7 +508,8 @@ rule assemble_short_reads:
tmpdir = config["tmpdir"],
final_assembly = True
resources:
mem_mb=int(config["max_memory"])*1024
mem_mb=int(config["max_memory"])*1024,
runtime = "96h",
conda:
"envs/spades.yaml"
benchmark:
Expand Down Expand Up @@ -519,7 +540,8 @@ rule spades_assembly_coverage:
bam = temp("data/short_vs_mega.bam"),
bai = temp("data/short_vs_mega.bam.bai")
resources:
mem_mb=int(config["max_memory"])*1024
mem_mb=int(config["max_memory"])*1024,
runtime = "24h",
params:
tmpdir = config["tmpdir"]
conda:
Expand All @@ -545,6 +567,9 @@ rule metabat_binning_short:
"../binning/envs/metabat2.yaml"
threads:
config["max_threads"]
resources:
mem_mb=int(config["max_memory"])*1024,
runtime = "24h",
benchmark:
"benchmarks/metabat_binning_short.benchmark.txt"
shell:
Expand All @@ -565,7 +590,8 @@ rule map_long_mega:
bam = temp("data/long_vs_mega.bam"),
bai = temp("data/long_vs_mega.bam.bai")
resources:
mem_mb=int(config["max_memory"])*1024
mem_mb=int(config["max_memory"])*1024,
runtime = "24h",
threads:
config["max_threads"]
conda:
Expand Down Expand Up @@ -608,6 +634,9 @@ rule get_read_pools:
"envs/mfqe.yaml"
threads:
config['max_threads']
resources:
mem_mb=int(config["max_memory"])*512,
runtime = "12h",
benchmark:
"benchmarks/get_read_pools.benchmark.txt"
script:
Expand All @@ -624,7 +653,8 @@ rule assemble_pools:
config["max_threads"]
group: 'assembly'
resources:
mem_mb=int(config["max_memory"])*1024
mem_mb=int(config["max_memory"])*1024,
runtime = "96h",
output:
fasta = "data/unicycler_combined.fa"
conda:
Expand All @@ -644,8 +674,6 @@ rule combine_assemblies:
output:
output_fasta = "data/final_contigs.fasta",
priority: 1
threads:
config["max_threads"]
script:
"scripts/combine_assemblies.py"

Expand All @@ -660,8 +688,6 @@ rule combine_long_only:
output_fasta = "data/final_contigs.fasta",
# long_bam = "data/final_long.sort.bam"
priority: 1
threads:
config["max_threads"]
script:
"scripts/combine_assemblies.py"

Expand Down
Loading