-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 48: Describe pyani plot
graphical output.
#305
Conversation
pyani plot
graphical output.
Codecov Report
@@ Coverage Diff @@
## master #305 +/- ##
=======================================
Coverage 76.12% 76.12%
=======================================
Files 52 52
Lines 3380 3380
=======================================
Hits 2573 2573
Misses 807 807 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that "interpreting output" belongs under "Basic Use" in the ToC, rather than at the same level as "installation" and requirements" - please can we link it from basic_use.rst
instead?
ANIb
/ANIblastall
/TETRA methods will be available in v0.3, so I don't think we should note a restriction there. (ll.51-55)
I'll not merge this until you've decided whether TETRA is symmetrical or not and changed the text. (l.55)
We should avoid double newlines (I think they're formatted the same, regardless…) (l.61)
Is the description on l.65 correct? It sounds like alignment length uses the length of the reference genome, as phrased.
The alignment length/similarity errors should be described for ANIb/ANIblastall methods (they don't crop up for TETRA).
l.75 is missing an "as" or "because", I think.
I took l.65 from the docstring in |
Yes, that's not entirely clear is it? Especially out of the immediate context.
I think this needs clarification in the documentation certainly, but maybe also a note in the comment at that point in the code might be useful. |
I've made those changes. Are there any situations where coverage should be symmetrical, actually? Unless I've misunderstood them (and notes in the docstrings are wrong, in some cases) none of these methods are. In which case the explanation I lifted from one of your issue comments about how coverage 'can be' asymmetrical might need to be changed. |
Two come to mind:
It should be trivial, I think, to generate a completely symmetrical coverage output by renaming a single input file multiple times, and pretending they were different genomes.
ANIb/ANIblastall/fastANI are not symmetrical, in general. Nor are they necessarily stable to circularly-permuted sequences, due to the sequence fragmentation step. TETRA is described here: https://doi.org/10.1111/j.1462-2920.2004.00624.x - having reminded myself with a quick skim, I think the pairwise score is calculated by:
which sounds symmetrical to me. What are your thoughts?
The mathematician in me tries to stop me from claiming that something is always true, if it is not. A common method of (dis)proof is by counterexample. The counterexamples above, contrived or coincidental as they might be, demonstrate that coverage can be symmetrical. If you feel that we need to state a stronger expectation of asymmetry, how about "will usually be asymmetrical"? |
I think there's opportunity for an IJSEM paper discussing how algorithm choice affects measurement, along the lines of https://doi.org/10.1099/ijsem.0.004124 |
I think it wasn't clear to me from the comment that you were only talking about coverage being asymmetrical. You're correct that coverage, alignment length, etc. don't apply for TETRA. It is kind of a proto-MinHash distance measurement, rather than an alignment.
*asymmetrical ? |
The description for similarity errors in ANIm should perhaps be modified. Currently, this does not use what NUCmer/MUMmer themselves identify as |
- add notes on distribution/scatter plots - clarify interpretations of heatmaps - correct method descriptions
I've checked the docs here and made modifications where necessary (e.g. alignment length in the BLAST methods does not subtract mismatches; adding summaries of the scatterplot/distribution output). NOTE: we will need to modify the descriptions of some measures when we correct ANIm calculations according to #340 |
Adds descriptions of
pyani plot
graphical output (based on responses to Issues #48 and #303, as well as how things are calculated in the code).Not ready to be merged, but ready for discussion of what else should be included / if anything needs to be removed.
Fixes #48.
Fixes #303.
Type of change
Action Checklist
pyani
repository under your own account (please allow write access for repository maintainers)CONTRIBUTING.md
)pytest -v
non-passing code will not be mergedorigin/master
flake8
andblack
before submissionPull requests
section in thepyani
repository