Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the relationship between ont-guppy-duplex-pipeline and duplex_tools #18

Open
tbooth opened this issue Aug 26, 2022 · 3 comments

Comments

@tbooth
Copy link

tbooth commented Aug 26, 2022

The current advice from ONT regarding how to perform duplex basecalling is here:

https://community.nanoporetech.com/posts/guppy-v6-0-0-release (dated 6th December 2021 - login required to view)

It makes no mention of duplex-tools, but says to pip install ont-guppy-duplex-pipeline and then run the script from that package, guppy_duplex, on the original fast5 files.

As far as I can see, this script is a rather clunky wrapper that calls guppy in simplex mode, then performs the equivalent of duplex_tools pairs_from_summary (the code for this is in ont_guppy_duplex_pipeline/channel_neighbours.py and looks like it's related to your duplex_tools/pairs_from_summary.py but the logic is not quite the same) and then runs guppy_basecaller_duplex to get the final result.

My main interest just now is to get a good but quick assessment of the approx number of duplex reads in each dataset, for QC purposes, and so duplex-tools seems the more useful approach. But so save others from having to peer through source code like I've been doing, could you please add some info to the README.md to say what is the relationship between these two ONT-developed packages?

Cheers!

@tbooth
Copy link
Author

tbooth commented Aug 26, 2022

Sorry, my mistake - I see ont-guppy-duplex-pipeline does also incorporate an alignment-based filtering step, but it does not yield the same results as this package. I get about twice the number of candidate duplex pairs. I guess I'll need to actually basecall these to see how many are false positives.

@cjw85
Copy link
Member

cjw85 commented Aug 26, 2022

Hi @tbooth,

The scripts in the current version in Guppy were taken from an earlier version of this repository, hence the similarities. Guppy needs updating, IIRC the major difference is the compute performance. @ollenordesjo can comment on the output differences.

@onordesjo
Copy link

Hi @tbooth, sorry, have been on vacation, just seeing this now.

The option that mostly affects the output results is the min_qscore filter (https://github.com/nanoporetech/duplex-tools/blob/master/duplex_tools/pairs_from_summary.py#L320). Without re-running the same things through ont-guppy-duplex-pipeline and duplex_tools and checking carefully, that would be my first guess on why the number of candidate duplex pairs are different.

Depending on your requirements, you may want to set this threshold lower than the default (I would suggest including the best ~85% of reads or something similar, whereever that threshold may be for your dataset). We had some discussions about setting this threshold more adaptively, but decided that a constant threshold would keep it more reproducible on a per-read level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants