Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DAS_Tool binning refinement #291

Merged
merged 66 commits into from
Jun 1, 2022
Merged
Show file tree
Hide file tree
Changes from 60 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
f28652b
Start work on DAS_Tool
jfy133 Mar 24, 2022
7379e64
Update DASTool move test to hybrid
jfy133 Apr 2, 2022
e454bb0
Update DAS_Tool to allow missing output files
jfy133 Apr 4, 2022
5477e45
Finalise DAS_Tool - fixing version reporting and add documentation
jfy133 Apr 7, 2022
dc2dd14
Add missing output directory to OUTPUT docs
jfy133 Apr 7, 2022
dc26f8d
prettier
jfy133 Apr 7, 2022
c8bb385
Update README
jfy133 Apr 7, 2022
78b1ed3
Apply suggestions from code review
jfy133 Apr 8, 2022
e7266eb
Attempt at allowing user to define which binning files go into downst…
jfy133 Apr 8, 2022
31ddfca
Adds improved output for bin_refinement to include unbins
jfy133 Apr 8, 2022
fc66b5c
Prettier
jfy133 Apr 8, 2022
fe14e53
Separate bin refinement CI test
jfy133 Apr 11, 2022
532a0b2
Fix non-binrefinement channel output and add error message
jfy133 Apr 12, 2022
e14c3f0
Fix skip-metabat causing undefined out channel and add DASTool as bin…
jfy133 Apr 14, 2022
2c081df
Add error in case of invalid bin refinement parameter combination
skrakau Apr 14, 2022
693f7c5
Update lib/WorkflowMag.groovy
jfy133 Apr 14, 2022
bba156a
Move postbinning check to other checks change publishing option
jfy133 Apr 14, 2022
ff32e18
Attempt to pre-rename input bins to DAS_Tool to ensure these have cor…
jfy133 Apr 14, 2022
9136428
Bash fix
jfy133 Apr 14, 2022
dc92886
Add error handling in BUSCO_PLOT
skrakau Apr 19, 2022
378f139
Fix output filenames in BUSCO_PLOT
skrakau Apr 19, 2022
e07521c
Fix BUSCO issue
jfy133 Apr 28, 2022
98a29a2
Undo .set{} replacements in `mag.nf` to do in another PR
jfy133 Apr 29, 2022
0fce6b3
Revert formatting
jfy133 Apr 29, 2022
97763c5
Finalise bin_summary update
jfy133 May 4, 2022
698d753
Remove debugging
jfy133 May 4, 2022
30348f1
Apply suggestions from code review
jfy133 May 4, 2022
fc5f48e
Revert "Add missing channel assignment"
jfy133 May 4, 2022
49a23da
Revert "Remove duplicate channel assignment"
jfy133 May 4, 2022
6d780ee
Revert "Merge branch 'dev' into dastool"
jfy133 May 4, 2022
a6f1c2a
Account for when no binning refinement executed for bin_summary
jfy133 May 4, 2022
e59f544
Update documentation fix publishing for DASTOOL modules
jfy133 May 5, 2022
be51cb6
Apply suggestions from code review
jfy133 May 5, 2022
1e32d87
Replace .set operation with variable assignment again
jfy133 May 9, 2022
eb728e1
Remove unnecessary binner info in meta after RENAME_PREDASTOOL
jfy133 May 9, 2022
22cf70e
Collapse binning and refinement output directories: note this will re…
jfy133 May 9, 2022
1a872df
Add module publishing dirs for BUSCO_SUMMARY|QUAST_BINS|QUAST_BINS_SU…
skrakau May 17, 2022
e9e81ce
Clean up modules.config
skrakau May 17, 2022
d1b7d12
Fix how-to-run comment in test_binrefinement.config
skrakau May 17, 2022
b29c2b7
Simplify code snippet to remove binner meta info
skrakau May 17, 2022
b44ff20
Adjust binning_refinement.nf for separate plotting for binners
skrakau May 17, 2022
0cdb5dd
Adjust output description of DAS Tool to new structure
skrakau May 17, 2022
02a582a
Update BUSCO description in output.md for new binning and refinment m…
skrakau May 17, 2022
b40ebc9
Merge remote-tracking branch 'upstream/dev' into dastool
skrakau May 19, 2022
f8f04c4
Apply suggestions from code review
skrakau May 19, 2022
5e7556c
Update output.md
skrakau May 19, 2022
57fffae
Fix binning refinement output folder in usage.md
skrakau May 19, 2022
6a40c60
Fix link in output.md
skrakau May 19, 2022
f011a44
Extend documentation with respect to --postbinning_input
skrakau May 19, 2022
c7675ac
Fix undefined channels and restructure code
skrakau May 17, 2022
3f25a8e
Fix wrong amount of left-padding spaces
skrakau May 19, 2022
a1f659d
Merge branch 'dev' into dastool
skrakau May 19, 2022
ed20591
Fix published output for DAS TOOL
skrakau May 20, 2022
501c154
Fix MaxBin2 extension expected for DASTOOL_FASTATOCONTIG2BIN_MAXBIN2
skrakau May 21, 2022
e100186
Fix: separate BINNING_REFINEMENT refined_bins output from unbins
skrakau May 21, 2022
d469e61
Use DASTool as meta.binner info
skrakau May 21, 2022
f48b83a
Run MAG_DEPTHS on each meta.binner, adjust output filenames and group…
skrakau May 23, 2022
0415bce
Change output filenames bin_depths_summary_refined.tsv -> bin_refined…
skrakau May 23, 2022
baa1aa6
Update CHANGELOG.md
skrakau May 25, 2022
199bc15
Add DAS Tool to overview figure
skrakau May 29, 2022
b5097d1
Update docs/usage.md
skrakau May 31, 2022
f5d73ce
Merge branch 'dev' into dastool
skrakau May 31, 2022
2a98e60
Update docs/usage.md
skrakau Jun 1, 2022
b515528
Tweak docs and output structure for depths files for clarity
jfy133 Jun 1, 2022
57d6782
Merge branch 'dastool' of github.com:jfy133/nf-core-mag into dastool
jfy133 Jun 1, 2022
a433d97
Remove re-enumeration of bin files for DAS Tool
skrakau Jun 1, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,15 @@ jobs:
matrix:
# Run remaining test profiles with minimum nextflow version
profile:
[test_host_rm, test_hybrid, test_hybrid_host_rm, test_busco_auto, test_ancient_dna, test_adapterremoval]
[
test_host_rm,
test_hybrid,
test_hybrid_host_rm,
test_busco_auto,
test_ancient_dna,
test_adapterremoval,
test_binrefinement,
]
steps:
- name: Check out pipeline code
uses: actions/checkout@v2
Expand Down
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#263](https://github.com/nf-core/mag/pull/263) - Restructure binning subworkflow in preparation for aDNA workflow and extended binning
- [#247](https://github.com/nf-core/mag/pull/247) - Add ancient DNA subworkflow
- [#263](https://github.com/nf-core/mag/pull/263) - Add MaxBin2 as second contig binning tool
- [#284](https://github.com/nf-core/mag/pull/285) - Add AdapterRemoval2 as an alternative read trimmer
- [#285](https://github.com/nf-core/mag/pull/285) - Add AdapterRemoval2 as an alternative read trimmer
- [#291](https://github.com/nf-core/mag/pull/291) - Add DAS Tool for bin refinement

### `Changed`

Expand Down
8 changes: 8 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@

## Pipeline tools

- [AdapterRemoval2](https://doi.org/10.1186/)

> Schubert, M., Lindgreen, S., and Orlando, L. 2016. "AdapterRemoval v2: Rapid Adapter Trimming, Identification, and Read Merging." BMC Research Notes 9 (February): 88. doi: 10.1186/s13104-016-1900-2

- [BCFtools](https://doi.org/10.1093/gigascience/giab008)

> Danecek, Petr, et al. "Twelve years of SAMtools and BCFtools." Gigascience 10.2 (2021): giab008. doi: 10.1093/gigascience/giab008
Expand All @@ -30,6 +34,10 @@

> Kim, D., Song, L., Breitwieser, F. P., & Salzberg, S. L. (2016). Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome research, 26(12), 1721-1729. doi: 10.1101/gr.210641.116.

- [DAS Tool](https://doi.org/10.1038/s41564-018-0171-1)

> Sieber, C. M. K., et al. 2018. "Recovery of Genomes from Metagenomes via a Dereplication, Aggregation and Scoring Strategy." Nature Microbiology 3 (7): 836-43. doi: 10.1038/s41564-018-0171-1

- [FastP](https://doi.org/10.1093/bioinformatics/bty560)

> Chen, S., Zhou, Y., Chen, Y., & Gu, J. (2018). fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34(17), i884–i890. doi: 10.1093/bioinformatics/bty560.
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ The pipeline then:
- (optionally) performs ancient DNA assembly validation using [PyDamage](https://github.com/maxibor/pydamage) and contig consensus sequence recalling with [Freebayes](https://github.com/freebayes/freebayes) and [BCFtools](http://samtools.github.io/bcftools/bcftools.html)
- predicts protein-coding genes for the assemblies using [Prodigal](https://github.com/hyattpd/Prodigal)
- performs metagenome binning using [MetaBAT2](https://bitbucket.org/berkeleylab/metabat/src/master/) and/or with [MaxBin2](https://sourceforge.net/projects/maxbin2/), and checks the quality of the genome bins using [Busco](https://busco.ezlab.org/)
- optionally refines bins with [DAS Tool](https://github.com/cmks/DAS_Tool)
- assigns taxonomy to bins using [GTDB-Tk](https://github.com/Ecogenomics/GTDBTk) and/or [CAT](https://github.com/dutilh/CAT)

Furthermore, the pipeline creates various reports in the results directory specified, including a [MultiQC](https://multiqc.info/) report summarizing some of the findings and software versions.
Expand Down
10 changes: 5 additions & 5 deletions bin/get_mag_depths.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,9 @@ def parse_args(args=None):
parser.add_argument('-d', '--depths' , required=True , metavar='FILE' , help="(Compressed) TSV file containing contig depths for each sample: contigName, contigLen, totalAvgDepth, sample1_avgDepth, sample1_var [, sample2_avgDepth, sample2_var, ...].")
parser.add_argument('-a', '--assembler' , required=True , type=str , help="Assembler name.")
parser.add_argument('-i', '--id' , required=True , type=str , help="Sample or group id.")
parser.add_argument('-m', '--binner' , required=True , type=str , help="Binning method.")
return parser.parse_args(args)
# Processing contig depths for each binner again, i.e. not the most efficient way, but ok

def main(args=None):
args = parse_args(args)
Expand All @@ -43,10 +45,8 @@ def main(args=None):

# Initialize output files
n_samples = len(sample_names)
binners = set([ os.path.basename(file).split("-")[1] for file in args.bins ])
for binner in binners:
with open(args.assembler + "-" + binner + "-" + args.id + "-binDepths.tsv", 'w') as outfile:
print("bin", '\t'.join(sample_names), sep='\t', file=outfile)
with open(args.assembler + "-" + args.binner + "-" + args.id + "-binDepths.tsv", 'w') as outfile:
print("bin", '\t'.join(sample_names), sep='\t', file=outfile)

# for each bin, access contig depths and compute mean bin depth (for all samples)
for file in args.bins:
Expand All @@ -66,7 +66,7 @@ def main(args=None):
all_depths[sample].append(contig_depths[sample])

binname = os.path.basename(file)
with open(args.assembler + "-" + binname.split("-")[1] + "-" + args.id + "-binDepths.tsv", 'a') as outfile:
with open(args.assembler + "-" + args.binner + "-" + args.id + "-binDepths.tsv", 'a') as outfile:
print(binname, '\t'.join(str(statistics.median(sample_depths)) for sample_depths in all_depths), sep='\t', file=outfile)


Expand Down
10 changes: 10 additions & 0 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -159,4 +159,14 @@ process {
cpus = { check_max (8 * task.attempt, 'cpus' ) }
memory = { check_max (20.GB * task.attempt, 'memory' ) }
}

withName: MAXBIN2 {
// often fails when insufficient information, so we allow it to gracefully fail without failing the pipeline
errorStrategy = { task.exitStatus in [ 1, 255 ] ? 'ignore' : 'retry' }
}

withName: DASTOOL_DASTOOL {
// if SCGs not found, bins cannot be assigned and DAS_tool will die with exit status 1
errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : task.exitStatus == 1 ? 'ignore' : 'finish' }
}
}
54 changes: 49 additions & 5 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -201,14 +201,23 @@ process {
]
}

withName: 'MAG_DEPTHS_PLOT|MAG_DEPTHS_SUMMARY' {
withName: 'MAG_DEPTHS_PLOT|MAG_DEPTHS_SUMMARY|MAG_DEPTHS_PLOT_REFINED' {
skrakau marked this conversation as resolved.
Show resolved Hide resolved
publishDir = [
path: { "${params.outdir}/GenomeBinning/depths" },
mode: params.publish_dir_mode,
pattern: "*.{png,tsv}"
]
}

withName: 'MAG_DEPTHS_SUMMARY_REFINED' {
ext.prefix = "bin_refined_depths_summary"
publishDir = [
path: { "${params.outdir}/GenomeBinning/depths" },
mode: params.publish_dir_mode,
pattern: "*.{tsv}"
]
}

withName: 'BIN_SUMMARY' {
publishDir = [
path: { "${params.outdir}/GenomeBinning" },
Expand All @@ -227,9 +236,9 @@ process {

withName: 'BUSCO|BUSCO_PLOT' {
publishDir = [
path: { "${params.outdir}/GenomeBinning/QC/BUSCO" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
path: { "${params.outdir}/GenomeBinning/QC/BUSCO" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

Expand Down Expand Up @@ -420,7 +429,6 @@ process {
ext.prefix = { "${meta.assembler}-MaxBin2-${meta.id}" }
// if no gene found, will crash so allow ignore so rest of pipeline
// completes but without MaxBin2 results
errorStrategy = { task.exitStatus in [ 1, 255 ] ? 'ignore' : 'retry' }
}

withName: SPLIT_FASTA {
Expand All @@ -443,6 +451,42 @@ process {
]
}

withName: DASTOOL_FASTATOCONTIG2BIN_METABAT2 {
ext.prefix = { "${meta.assembler}-MetaBAT2-${meta.id}" }
}

withName: DASTOOL_FASTATOCONTIG2BIN_MAXBIN2 {
ext.prefix = { "${meta.assembler}-MaxBin2-${meta.id}" }
}

withName: DASTOOL_DASTOOL {
publishDir = [
[
path: { "${params.outdir}/GenomeBinning/DASTool" },
mode: params.publish_dir_mode,
pattern: '*.{tsv,log,eval,seqlength}'
],
]
ext.prefix = { "${meta.assembler}-DASTool-${meta.id}" }
ext.args = "--write_bins --write_unbinned --write_bin_evals --score_threshold ${params.refine_bins_dastool_threshold}"
}

withName: RENAME_POSTDASTOOL {
publishDir = [
[
path: { "${params.outdir}/GenomeBinning/DASTool/unbinned" },
mode: params.publish_dir_mode,
pattern: '*-DASToolUnbinned-*.fa'
],
[
path: { "${params.outdir}/GenomeBinning/DASTool/bins" },
mode: params.publish_dir_mode,
// pattern needs to be updated in case of new binning methods
pattern: '*-{MetaBAT2,MaxBin2}Refined-*.fa'
d4straub marked this conversation as resolved.
Show resolved Hide resolved
]
]
}

withName: CUSTOM_DUMPSOFTWAREVERSIONS {
publishDir = [
path: { "${params.outdir}/pipeline_info" },
Expand Down
16 changes: 8 additions & 8 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,12 @@ params {
max_time = '6.h'

// Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/mag/samplesheets/samplesheet.csv'
centrifuge_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_cf.tar.gz"
kraken2_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_kraken.tgz"
skip_krona = true
min_length_unbinned_contigs = 1
max_unbinned_contigs = 2
busco_reference = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2020-03-06.tar.gz"
gtdb = false
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/mag/samplesheets/samplesheet.csv'
centrifuge_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_cf.tar.gz"
kraken2_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_kraken.tgz"
skip_krona = true
min_length_unbinned_contigs = 1
max_unbinned_contigs = 2
busco_reference = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2020-03-06.tar.gz"
gtdb = false
}
2 changes: 2 additions & 0 deletions conf/test_ancient_dna.config
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,6 @@ params {
skip_spades = false
skip_spadeshybrid = true
bcftools_view_variant_quality = 0
refine_bins_dastool = true
refine_bins_dastool_threshold = 0
}
34 changes: 34 additions & 0 deletions conf/test_binrefinement.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for running minimal tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines input files and everything required to run a fast and simple pipeline test.

Use as follows:
nextflow run nf-core/mag -profile test_binrefinement,<docker/singularity> --outdir <OUTDIR>

----------------------------------------------------------------------------------------
*/

params {
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'

// Limit resources so that this can run on GitHub Actions
max_cpus = 2
max_memory = '6.GB'
max_time = '6.h'

// Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/mag/samplesheets/samplesheet.csv'
centrifuge_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_cf.tar.gz"
kraken2_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_kraken.tgz"
skip_krona = true
min_length_unbinned_contigs = 1
max_unbinned_contigs = 2
busco_reference = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2020-03-06.tar.gz"
gtdb = false
refine_bins_dastool = true
refine_bins_dastool_threshold = 0
postbinning_input = 'both'
}
Binary file modified docs/images/mag_workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading