Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nanoq to the pipeline #508

Merged
merged 13 commits into from
Jul 26, 2024

Conversation

LilyAnderssonLee
Copy link
Contributor

@LilyAnderssonLee LilyAnderssonLee commented Jul 11, 2024

Add nanoq for nanopore reads filtering and nanopore reads summary statistics.
Fix #415

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/taxprofiler branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Copy link

github-actions bot commented Jul 11, 2024

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 97c4c2b

+| ✅ 265 tests passed       |+
!| ❗   2 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in ci.yml: You can customise CI pipeline run tests as required

✅ Tests passed:

Run details

  • nf-core/tools version 2.14.1
  • Run at 2024-07-26 11:49:02

@LilyAnderssonLee LilyAnderssonLee requested a review from a team July 11, 2024 11:45
@LilyAnderssonLee
Copy link
Contributor Author

LilyAnderssonLee commented Jul 11, 2024

The summary report from nanoq generated by : nextflow run main.nf -profile test,singularity --outdir results/ --longread_filter_tool 'nanoq'

ERR3201952_ERR3201952_filtered.stats:

Nanoq Read Summary
====================

Number of reads:      2971
Number of bases:      6036813
N50 read length:      2205
Longest read:         11925 
Shortest read:        1000
Mean read length:     2031
Median read length:   1741 
Mean read quality:    8.57 
Median read quality:  8.37


Read length thresholds (bp)

> 200       2971              100.0%
> 500       2971              100.0%
> 1000      2966              99.8%
> 2000      1155              38.9%
> 5000      55                01.9%
> 10000     1                 00.0%
> 30000     0                 00.0%
> 50000     0                 00.0%
> 100000    0                 00.0%
> 1000000   0                 00.0%


Read quality thresholds (Q)

> 5   2971          100.0%
> 7   2971          100.0%
> 10  259           08.7%
> 12  70            02.4%
> 15  1             00.0%
> 20  0             00.0%
> 25  0             00.0%
> 30  0             00.0%

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! That looks like a perfect log file for MultiQC!

For the newly designed 1.2, should we include pychopper as an alterantive adapterermoval tool? Particuarly given as it's an official nanopore tool.

In addition to comments below, also missing:

  1. Output docs descriptions for nanoq
  2. Missing auto-method-description update in the local/utils_nfcore_taxprofiler<....> thingy
  3. Missing listing of tool on main README in the various points
  4. Missing metromap update (let me know if yo uwant me to add dthat)

nextflow_schema.json Outdated Show resolved Hide resolved
@@ -270,6 +284,13 @@
"description": "Specify the number of high-quality bases in the library to be retained",
"fa_icon": "fas fa-bullseye",
"help_text": "Removes the worst reads until only the specified value of bases remain, useful for very large read sets. If the input read set is less than the specified value, this setting will have no effect. _Modified from [Filtlong documentation](https://github.com/rrwick/Filtlong)_\n\n> Modifies tool parameter(s):\n> - Filtlong: `--keep_percent`"
},
"longread_qc_qualityfilter_minquality": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this only applies to nanoq, maybe we should indicate this... I guess we could change the name of the parmaeter but this would require a major bump e.g. 1.1 -> 2 (rather than 1.2)

Alternatively for the help descriptions we prepend for this and the other tools in this section if it's a tool-specific parameter

So here we could have maybe: Nanoq only: specify minimum average read quality filter, and above, longread_qc_qualityfilter_keeppercent could be Filtlong only: specify the percent of high-quality bases to be retained?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

subworkflows/local/longread_adaterremoval.nf Outdated Show resolved Hide resolved
subworkflows/local/longread_filtering.nf Outdated Show resolved Hide resolved
subworkflows/local/longread_preprocessing.nf Outdated Show resolved Hide resolved
@@ -17,31 +17,22 @@ workflow LONGREAD_PREPROCESSING {
ch_multiqc_files = Channel.empty()

if ( !params.longread_qc_skipadaptertrim && params.longread_qc_skipqualityfilter) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing MultiQC mixing from the adapterrermoval subworkflow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

LilyAnderssonLee and others added 3 commits July 12, 2024 09:05
Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
@LilyAnderssonLee
Copy link
Contributor Author

For the newly designed 1.2, should we include pychopper as an alterantive adapterermoval tool? Particuarly given as it's an official nanopore tool.

As Sofia mentioned in this issue, Pychopper was designed for cDNA reads. But I guess the principles for identifying and removing adapter sequences should be the same for both cDNA and DNA reads.

Another thing is Pychopper was archived in 2022, and its last release was in 2020.

If you think it's proper, we could add it to 1.2 version.

@LilyAnderssonLee
Copy link
Contributor Author

LilyAnderssonLee commented Jul 12, 2024

@jfy133 It would be great if you could update metromap. I guess metromap could be updated in the end after adding all preprocessing tools for long reads.

@jfy133
Copy link
Member

jfy133 commented Jul 18, 2024

For the newly designed 1.2, should we include pychopper as an alterantive adapterermoval tool? Particuarly given as it's an official nanopore tool.

As Sofia mentioned in this issue, Pychopper was designed for cDNA reads. But I guess the principles for identifying and removing adapter sequences should be the same for both cDNA and DNA reads.

Another thing is Pychopper was archived in 2022, and its last release was in 2020.

If you think it's proper, we could add it to 1.2 version.

Ok if it's old and porechop_Abi is more recent we can stick with that

@jfy133 It would be great if you could update metromap. I guess metromap could be updated in the end after adding all preprocessing tools for long reads.

Yes lets update it right at the end, so I don't have to move everythign around every time :)

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now :D

"default": "filtlong",
"enum": ["filtlong", "nanoq"],
"fa_icon": "fas fa-hammer",
"description": "Specify which tool to use for long reads filtering"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give any guidance in teh long-form help text which to use when?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I also set the default as nanoq

@LilyAnderssonLee LilyAnderssonLee merged commit e1b367f into nf-core:bouncy-basenji Jul 26, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants