ANIm should not be symmetric #373

donovan-h-parks · 2022-01-17T16:35:49Z

Summary:

PyANI treats ANIm as symmetric, but the order of genomes impacts this calculation.

Description:

I have two test genomes. The ANIm results produced by PyANI change depending on the order these genomes are processed. That is, the results depend on which genome is considered the "reference" and which the "query" when running nucmer.

Reproducible Steps:

Process the attached genomes with PyANI using average_nucleotide_identity.py -m ANIm --workers 1.
Swap the names of the genomes (i.e. genome1.fna becomes genome2.fna and genome2.fna becomes genome1.fna).
Run PyANI again as in step 1, but changing the output directory.
Compare the results and observe that the ANIm values are now different.

As a concrete results, the number of similarity errors changes from 512 to 561 depending on the order of the genomes.

Expected Output:

test_genomes.tar.gz

The ANI-percentage_identity.tab, ANIm_similarity_errors.tab, and ANIm_alignment_lengths.tab files are non-symmetric matrices that reflect which genome was the reference and which genome was the query.

pyani Version:

v0.2.11

Operating System:

Ubuntu

The text was updated successfully, but these errors were encountered:

widdowquinn · 2022-01-17T16:58:53Z

Thanks @dparks1134 - apologies for the bug, and we'll deal with this as a priority.

Quick notes

although it stems from the same misunderstanding of/assumptions about mummer's operation, this is not the same issue as Alignment coverage >1.0 #340 and will probably not be fixed by the proposed changes to resolve that issue.
the resolution currently appears to be to run mummer for the reciprocal pairwise comparisons.

L.

widdowquinn · 2022-01-19T14:08:14Z

NOTE: Handling under branch issue_373 for version 0.3.

Issue 373 - preliminary merge with multiple fixes to annoyances (warnings/small errors) prior to real fix of #373.

baileythegreen · 2022-01-24T20:06:37Z

For reference, two sets of output for the genomes provided by @dparks1134, named <reference>_vs_<query>.zip.

genome1_vs_genome2.zip
genome2_vs_genome1.zip

baileythegreen · 2022-01-24T21:11:31Z

I have pushed changes to issue_373 that remove the assumption that ANIm is symmetric; skip a test that fails if we no longer enforce genome sorting alphabetically as a result; and updated docs, so that they don't claim it is symmetric.

I will see if there are other things I have missed; in particular, I think perhaps the test outputs might need to be updated; (they pass, but this might just mean the differences fall within the designated tolerances).

widdowquinn · 2022-01-25T09:44:29Z

I think perhaps the test outputs might need to be updated; (they pass, but this might just mean the differences fall within the designated tolerances).

I think your guess is probably correct. It would be useful to have (for the record) a pyani compare output comparing (i) the old, symmetric behaviour and (ii) the new, non-symmetric behaviour so we can see how big the difference is for at least one real-world example we're familiar with.

baileythegreen · 2022-04-04T12:49:03Z

I tried uploading the archive of inputs I used here, but I think that must be too big; I've instead included the classes.txt file. @widdowquinn, it's a dataset you sent me, so you should already have the sequence files.

classes.txt

Here's the output from running pyani compare on the symmetric and asymmetric versions of anim, using default settings. I have not yet addressed the suggestions made on pyani compare output in #364, so apologies for the less-than-useful colour schemes, et cetera.

summary_run1_vs_run2.md
ref_1_vs_query_2_plots.zip

Update

Here are the updated versions of the summary report and plots; changes include—better colour maps, more useful axes, and more informative titles. (The run numbers differ because I always forget when you need to specify labels, et cetera.)

updated_plots_with_labels.zip

summary_run4_vs_run5.md

baileythegreen · 2022-04-05T15:12:21Z

It looks like Dickeya didantii NCPPB 898 chromosome, whole genome shotgun sequence specifically sees a lot of differences between the two.

kiepczi · 2024-04-18T12:47:59Z

I'm also marking this for closure, because my understanding is that this issue was fixed with 1f10a6c.

Specifically, we have now decided to run two pairwise comparisons (forward and reverse), and calculate ANIm values for query for each comparison.

widdowquinn added the bug something isn't working how it should label Jan 17, 2022

widdowquinn assigned widdowquinn and baileythegreen Jan 17, 2022

widdowquinn added the HIGH PRIORITY high priority issue label Jan 17, 2022

widdowquinn mentioned this issue Jan 20, 2022

Issue 373 #375

Merged

21 tasks

widdowquinn added a commit that referenced this issue Jan 20, 2022

Merge pull request #375 from widdowquinn/issue_373

c748d55

Issue 373 - preliminary merge with multiple fixes to annoyances (warnings/small errors) prior to real fix of #373.

baileythegreen mentioned this issue Jan 24, 2022

Issue #373: ANIm should not be symmetric #376

Open

21 tasks

baileythegreen mentioned this issue Apr 13, 2022

Ensure ANIm identity scores are true metrics #151

Closed

widdowquinn added this to the 0.3.0 milestone Apr 27, 2022

kiepczi mentioned this issue Feb 6, 2024

Fix alignment coverage >1.0 and aniM symmetrical behaviour #421

Closed

kiepczi added the Marked for closure Ready to close label Apr 18, 2024

widdowquinn closed this as completed Apr 18, 2024

github-project-automation bot added this to pyani Aug 26, 2024

github-project-automation bot moved this to Done in pyani Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ANIm should not be symmetric #373

ANIm should not be symmetric #373

donovan-h-parks commented Jan 17, 2022

widdowquinn commented Jan 17, 2022

widdowquinn commented Jan 19, 2022 •

edited

Loading

baileythegreen commented Jan 24, 2022

baileythegreen commented Jan 24, 2022 •

edited

Loading

widdowquinn commented Jan 25, 2022

baileythegreen commented Apr 4, 2022 •

edited

Loading

baileythegreen commented Apr 5, 2022

kiepczi commented Apr 18, 2024

ANIm should not be symmetric #373

ANIm should not be symmetric #373

Comments

donovan-h-parks commented Jan 17, 2022

Summary:

Description:

Reproducible Steps:

Expected Output:

pyani Version:

Operating System:

widdowquinn commented Jan 17, 2022

widdowquinn commented Jan 19, 2022 • edited Loading

baileythegreen commented Jan 24, 2022

baileythegreen commented Jan 24, 2022 • edited Loading

widdowquinn commented Jan 25, 2022

baileythegreen commented Apr 4, 2022 • edited Loading

baileythegreen commented Apr 5, 2022

kiepczi commented Apr 18, 2024

widdowquinn commented Jan 19, 2022 •

edited

Loading

baileythegreen commented Jan 24, 2022 •

edited

Loading

baileythegreen commented Apr 4, 2022 •

edited

Loading