Merge pull request #411 from nf-core/dev

1.1.2 release
nf-core · Oct 27, 2023 · 3d4eda2 · 3d4eda2
2 parents baede5b + 73ec485
commit 3d4eda2
Show file tree

Hide file tree

Showing 14 changed files with 79 additions and 27 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,6 +3,26 @@
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## v1.1.2 - Augmented Akita Patch [2023-10-27]
+
+### `Added`
+
+- [#408](https://github.com/nf-core/taxprofiler/pull/408) Added preprint citation information to README and manifest (added by @jfy133)
+
+### `Fixed`
+
+- [#405](https://github.com/nf-core/taxprofiler/pull/405) Fix database to tool mismatching in KAIJU2KRONA input (❤️ to @MajoroMask for reporting, fix by @jfy133)
+- [#406](https://github.com/nf-core/taxprofiler/pull/406) Fix overwriting of bracken-derived kraken2 outputs when the database name is shared between Bracken/Kraken2. (❤️ to @MajoroMask for reporting, fix by @jfy133)
+- [#409](https://github.com/nf-core/taxprofiler/pull/409) Fix a NullPointerException error occurring occasionally in older version of MEGAN's rma2info (❤️ to @MajoroMask for reporting, fix by @jfy133)
+
+### `Dependencies`
+
+| Tool           | Previous version | New version |
+| -------------- | ---------------- | ----------- |
+| megan/rma2info | 6.21.7           | 6.24.20     |
+
+### `Deprecated`
+
 ## v1.1.1 - Augmented Akita Patch [2023-10-11]
 
 ### `Added`

diff --git a/README.md b/README.md
@@ -11,6 +11,8 @@
 
 [![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23taxprofiler-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/taxprofiler)[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/nf_core)[![Follow on Mastodon](https://img.shields.io/badge/mastodon-nf__core-6364ff?labelColor=FFFFFF&logo=mastodon)](https://mstdn.science/@nf_core)[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core)
 
+[![Cite Preprint](https://img.shields.io/badge/Cite%20Us!-Cite%20Preprint-orange)](https://doi.org/10.1101/2023.10.20.563221)
+
 ## Introduction
 
 **nf-core/taxprofiler** is a bioinformatics best-practice analysis pipeline for taxonomic classification and profiling of shotgun short- and long-read metagenomic data. It allows for in-parallel taxonomic identification of reads or taxonomic abundance estimation with multiple classification and profiling tools against multiple databases, and produces standardised output tables for facilitating results comparison between different tools and databases.
@@ -142,7 +144,11 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
 
 ## Citations
 
-If you use nf-core/taxprofiler for your analysis, please cite it using the following doi: [10.5281/zenodo.7728364](https://doi.org/10.5281/zenodo.7728364)
+If you use nf-core/taxprofiler for your analysis, please cite it using the following doi: [10.1101/2023.10.20.563221](https://doi.org/10.1101/2023.10.20.563221).
+
+> Stamouli, S., Beber, M. E., Normark, T., Christensen II, T. A., Andersson-Li, L., Borry, M., Jamy, M., nf-core community, & Fellows Yates, J. A. (2023). nf-core/taxprofiler: Highly parallelised and flexible pipeline for metagenomic taxonomic classification and profiling. In bioRxiv (p. 2023.10.20.563221). https://doi.org/10.1101/2023.10.20.563221
+
+For the latest version of the code, cite the Zenodo doi: [10.5281/zenodo.7728364](https://doi.org/10.5281/zenodo.7728364)
 
 An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
 

diff --git a/assets/multiqc_config.yml b/assets/multiqc_config.yml
@@ -1,7 +1,7 @@
 report_comment: >
-  This report has been generated by the <a href="https://github.com/nf-core/taxprofiler/releases/tag/1.1.1" target="_blank">nf-core/taxprofiler</a>
+  This report has been generated by the <a href="https://github.com/nf-core/taxprofiler/releases/tag/1.1.2" target="_blank">nf-core/taxprofiler</a>
   analysis pipeline. For information about how to interpret these results, please see the
-  <a href="https://nf-co.re/taxprofiler/1.1.1/docs/output" target="_blank">documentation</a>.
+  <a href="https://nf-co.re/taxprofiler/1.1.2/docs/output" target="_blank">documentation</a>.
 
 report_section_order:
   "nf-core-taxprofiler-methods-description":

diff --git a/conf/modules.config b/conf/modules.config
@@ -485,7 +485,7 @@ process {
     }
 
     withName: KRAKENTOOLS_COMBINEKREPORTS_KRAKEN {
-        ext.prefix = { "kraken2_${meta.id}_combined_reports" }
+        ext.prefix = { "kraken2_${meta.db_name}_combined_reports" }
         publishDir = [
             path: { "${params.outdir}/kraken2/" },
             mode: params.publish_dir_mode,

diff --git a/docs/images/taxprofiler_logo.svg b/docs/images/taxprofiler_logo.svg
diff --git a/docs/output.md b/docs/output.md
@@ -360,6 +360,7 @@ The main taxonomic profiling file from Bracken is the `*.tsv` file. This provide
 
 - `kraken2/`
   - `<db_name>_combined_reports.txt`: A combined profile of all samples aligned to a given database (as generated by `krakentools`)
+    - If you have also run Bracken, the original Kraken report (i.e., _before_ read re-assignment) will also be included in this directory with `-bracken` suffixed to your Bracken database name. For example: `kraken2-<mydatabase>-bracken.tsv`. However in most cases you want to use the actual Bracken file (i.e., `bracken_<mydatabase>.tsv`).
   - `<db_name>/`
     - `<sample_id>_<db_name>.classified.fastq.gz`: FASTQ file containing all reads that had a hit against a reference in the database for a given sample
     - `<sample_id>_<db_name>.unclassified.fastq.gz`: FASTQ file containing all reads that did not have a hit in the database for a given sample
@@ -582,6 +583,7 @@ The resulting HTML files can be loaded into your web browser for exploration. Ea
   - `<tool>_<database>*.{tsv,csv,arrow,parquet,biom}`: Standardised taxon table containing multiple samples. The standard format is the `tsv`.
     - The first column describes the taxonomy ID and the rest of the columns describe the read counts for each sample.
     - Note that the file naming scheme will apply regardless of whether `TAXPASTA_MERGE` (multiple sample run) or `TAXPASTA_STANDARDISE` (single sample run) are executed.
+    - If you have also run Bracken, the initial Kraken report (i.e., _before_ read re-assignment) will also be included in this directory with `-bracken` suffixed to your Bracken database name. For example: `kraken2-<mydatabase>-bracken.tsv`. However in most cases you want to use the actual Bracken file (i.e., `bracken_<mydatabase>.tsv`).
 
   </details>
 

diff --git a/modules.json b/modules.json
@@ -163,7 +163,7 @@
                     },
                     "megan/rma2info": {
                         "branch": "master",
-                        "git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
+                        "git_sha": "dbce8951ff9a39ad08d87e563636bbcc6ef34032",
                         "installed_by": ["modules"]
                     },
                     "metaphlan/mergemetaphlantables": {

diff --git a/modules/nf-core/megan/rma2info/environment.yml b/modules/nf-core/megan/rma2info/environment.yml
diff --git a/modules/nf-core/megan/rma2info/main.nf b/modules/nf-core/megan/rma2info/main.nf
diff --git a/modules/nf-core/megan/rma2info/meta.yml b/modules/nf-core/megan/rma2info/meta.yml
diff --git a/nextflow.config b/nextflow.config
@@ -368,8 +368,8 @@ manifest {
     description     = """Taxonomic classification and profiling of shotgun short- and long-read metagenomic data"""
     mainScript      = 'main.nf'
     nextflowVersion = '!>=23.04.0'
-    version         = '1.1.1'
-    doi             = '10.5281/zenodo.7728364'
+    version = '1.1.2'
+    doi             = '10.1101/2023.10.20.563221'
 }
 
 // Load modules.config for DSL2 module specific options

diff --git a/subworkflows/local/profiling.nf b/subworkflows/local/profiling.nf
@@ -172,9 +172,13 @@ workflow PROFILING {
         ch_raw_classifications = ch_raw_classifications.mix( KRAKEN2_KRAKEN2.out.classified_reads_assignment )
         ch_raw_profiles        = ch_raw_profiles.mix(
             KRAKEN2_KRAKEN2.out.report
-                // Set the tool to be strictly 'kraken2' instead of potentially 'bracken' for downstream use.
-                // Will remain distinct from 'pure' Kraken2 results due to distinct database names in file names.
-                .map { meta, report -> [meta + [tool: 'kraken2'], report]}
+                // Rename tool in the meta for the for-bracken files to disambiguate from only-kraken2 results in downstream steps.
+                // Note may need to rename back to to just bracken in those downstream steps depending on context.
+                .map {
+                    meta, report ->
+                        def new_tool = 
+                    [meta + [tool: meta.tool == 'bracken' ? 'kraken2-bracken' : meta.tool], report]
+                }
         )
 
     }

diff --git a/subworkflows/local/standardisation_profiles.nf b/subworkflows/local/standardisation_profiles.nf
@@ -52,12 +52,19 @@ workflow STANDARDISATION_PROFILES {
                             .map {
                                     meta, profile ->
                                         def meta_new = [:]
-                                        meta_new.id = meta.db_name
                                         meta_new.tool = meta.tool == 'malt' ? 'megan6' : meta.tool
+                                        meta_new.db_name = meta.db_name
                                         [meta_new, profile]
                             }
                             .groupTuple ()
-                            .map { [ it[0], it[1].flatten() ] }
+                            .map {
+                                meta, profiles ->
+                                    meta = meta + [
+                                        tool: meta.tool == 'kraken2-bracken' ? 'kraken2' : meta.tool, // replace to get the right output-format description
+                                        id: meta.tool == 'kraken2-bracken' ? "${meta.db_name}-bracken" : "${meta.db_name}" // append so to disambiguate when we have same databases for kraken2 step of bracken, with normal bracken
+                                    ]
+                                [ meta, profiles.flatten() ]
+                            }
 
     ch_taxpasta_tax_dir = params.taxpasta_taxonomy_dir ? Channel.fromPath(params.taxpasta_taxonomy_dir, checkIfExists: true).collect() : []
 
@@ -85,7 +92,7 @@ workflow STANDARDISATION_PROFILES {
             centrifuge: it[0]['tool'] == 'centrifuge'
             ganon: it[0]['tool'] == 'ganon'
             kmcp: it [0]['tool'] == 'kmcp'
-            kraken2: it[0]['tool'] == 'kraken2'
+            kraken2: it[0]['tool'] == 'kraken2' || it[0]['tool'] == 'kraken2-bracken'
             metaphlan: it[0]['tool'] == 'metaphlan'
             motus: it[0]['tool'] == 'motus'
             unknown: true
@@ -158,11 +165,15 @@ workflow STANDARDISATION_PROFILES {
     // Have to sort by size to ensure first file actually has hits otherwise
     // the script fails
     ch_profiles_for_kraken2 = ch_input_profiles.kraken2
-                                .map { [it[0]['db_name'], it[1]] }
-                                .groupTuple(sort: {-it.size()} )
                                 .map {
-                                    [[id:it[0]], it[1]]
+                                    meta, profiles ->
+                                        def new_meta = [:]
+                                        new_meta.tool = meta.tool == 'kraken2-bracken' ? 'kraken2' : meta.tool // replace to get the right output-format description
+                                        new_meta.id = meta.tool // append so to disambiguate when we have same databases for kraken2 step of bracken, with normal bracken
+                                        new_meta.db_name = meta.tool == 'kraken2-bracken' ? "${meta.db_name}-bracken" : "${meta.db_name}" // append so to disambiguate when we have same databases for kraken2 step of bracken, with normal bracken
+                                    [ new_meta, profiles ]
                                 }
+                                .groupTuple(sort: {-it.size()})
 
     KRAKENTOOLS_COMBINEKREPORTS_KRAKEN ( ch_profiles_for_kraken2 )
     ch_multiqc_files = ch_multiqc_files.mix( KRAKENTOOLS_COMBINEKREPORTS_KRAKEN.out.txt )

diff --git a/subworkflows/local/visualization_krona.nf b/subworkflows/local/visualization_krona.nf
@@ -27,7 +27,7 @@ workflow VISUALIZATION_KRONA {
     ch_input_profiles = profiles
         .branch {
             centrifuge: it[0]['tool'] == 'centrifuge'
-            kraken2: it[0]['tool'] == 'kraken2'
+            kraken2: it[0]['tool'] == 'kraken2' || it[0]['tool'] == 'kraken2-bracken'
             unknown: true
         }
     ch_input_classifications = classifications
@@ -41,7 +41,11 @@ workflow VISUALIZATION_KRONA {
         Convert Kraken2 formatted reports into Krona text files
     */
     ch_kraken_reports = ch_input_profiles.kraken2
-        .mix( ch_input_profiles.centrifuge )
+            .map {
+                meta, report ->
+                [meta +  [tool: meta.tool == 'bracken' ? 'kraken2-bracken' : meta.tool], report]
+            }
+            .mix( ch_input_profiles.centrifuge )
     KRAKENTOOLS_KREPORT2KRONA ( ch_kraken_reports )
     ch_krona_text = ch_krona_text.mix( KRAKENTOOLS_KREPORT2KRONA.out.txt )
     ch_versions = ch_versions.mix( KRAKENTOOLS_KREPORT2KRONA.out.versions.first() )
@@ -50,8 +54,8 @@ workflow VISUALIZATION_KRONA {
         Combine Kaiju profiles with their databases
     */
     ch_input_for_kaiju2krona = ch_input_classifications.kaiju
-        .map{ [it[0]['db_name'], it[0], it[1]] }
-        .combine( databases.map{ [it[0]['db_name'], it[1]] }, by: 0 )
+        .map{ meta, profiles -> [[meta['tool'], meta['db_name']], meta, profiles] }
+        .combine( databases.map{ meta, db -> [[meta['tool'], meta['db_name']], db] }, by: 0 )
         .multiMap{
             it ->
                 profiles: [it[1], it[2]]