-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add uchime2_denovo to close #92 #100
base: dev
Are you sure you want to change the base?
Conversation
Hi @colinbrislawn, |
Cool! How does this look for the CLI? (CLI docs for the existing function) qiime vsearch uchime-denovo \
--p-method 'uchime2' \
--p-mindiffs 99 # ignored when running uchime2 and uchime3
--p-mindiv 0.8 # ignored when running uchime2 and uchime3
--p-minh 0.99 # ignored when running uchime2 and uchime3
... What should we do if someone passes settings that are not used by uchime2 and uchime3? |
Let's add support for |
Is The methods are tested with 100% coverage, but the results are not, as I'm not sure how they should work with non-trivial examples. If we want to build a test that shows the difference between these methods, here's commentary on vsearch's implementation: @nbokulich, if you have the time and interest, I would appreciate your review! |
q2_vsearch/tests/test_chimera.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on adding tests for these new algorithm versions (beyond testing the command string)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a trivial test that shows both working.
I don't have an example in which these methods differ.... Would you like me to try and find one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that there is a test to which we pass "uchime3" as the method; however we technically can't be sure that this method is being implemented by the underlying software without differentiating behavior.
I understand if it's too difficult to contrive input data that shows different expected behavior for the different algorithm methods, but if it is reasonably easy to do so it would be best.
@@ -404,12 +406,17 @@ | |||
'abundances).'), | |||
}, | |||
parameter_descriptions={ | |||
'method': ('Denovo chimera detection based on uchime (Edgar 2011), ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@colinvwood While trying to keep this short and sweet, I've added a little more detail.
How does this look?
I think we should cite Rob's papers too. Let me work on adding those two papers to the .bib file...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good 👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you want me to add these citations to the .bib and link them up, or are we good to go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're referring to each of the citations for each of the algorithm iterations, I think that yes we should do that. If you're referring to "Rob's papers" I'm unsure which ones those are exactly.
Hey @colinbrislawn, not sure if you're waiting on any more input from me but everything looks good to me except for the outstanding tests discussion. |
I'm working to build some good tests, starting from PR2 from the test data. Note: if I knew these algorithms better, I may be able to make mock tests from first principles. But I don't! I may need to put this on hold while I work on other projects. # shorten names
vsearch --fastx_uniques PR2-18S-rRNA-V4.derep.fsa --sizein --sizeout --relabel derep_ --fastaout PR2_short.fsa
# run all 3 methods
vsearch --uchime_denovo PR2_short.fsa --chimeras chi_v1.fasta --uchimeout nonchi_v1.uc &
vsearch --uchime2_denovo PR2_short.fsa --chimeras chi_v2.fasta --uchimeout nonchi_v2.uc &
vsearch --uchime3_denovo PR2_short.fsa --chimeras chi_v3.fasta --uchimeout nonchi_v3.uc &
# inspect for differences
git diff --no-index --word-diff -U0 nonchi_v1.uc nonchi_v2.uc
git diff --no-index --word-diff -U0 nonchi_v1.uc nonchi_v3.uc
git diff --no-index --word-diff -U0 nonchi_v2.uc nonchi_v3.uc
grep 'derep_33;size=86\s' nonchi_v1.uc | head -n 1
grep 'derep_33;size=86\s' nonchi_v2.uc | head -n 1
grep 'derep_33;size=86\s' nonchi_v3.uc | head -n 1 Possible reads from PR2 to use. Note the short names!
Testing
This higher default --abskew threshold should lead to fewer called chimeras, which is what I see! |
Would this work as a minimal test? >uchime
0.0239 derep_33;size=86 derep_2;size=485 derep_5;size=315 derep_5;size=315 100.0 98.1 99.4 97.5 99.4 1 00 3 0 0 0.6 N
>uchime2
0.0239 derep_33;size=86 derep_2;size=485 derep_5;size=315 derep_5;size=315 100.0 98.1 99.4 97.5 99.4 1 00 3 0 0 0.6 Y
>ucmime3
0.0000 derep_33;size=86 * * * * * * * * 0 0 0 0 0 0 * N 1 and 2 report the same score, but different final decision Let me check if this works with only three reads! |
WIP to close #92
Add internal functions and tests to use 'uchime' 'uchime2' or 'uchime3'
Open questions:
--abskew
for some or all of these?