Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsuccessful atom mapping attempts #270

Open
alongd opened this issue Aug 22, 2021 · 4 comments
Open

Unsuccessful atom mapping attempts #270

alongd opened this issue Aug 22, 2021 · 4 comments

Comments

@alongd
Copy link

alongd commented Aug 22, 2021

Describe the bug

The Molecule.align() method for atom mapping could not successfully map some (relatively simple?) examples.
The problem seems to be in mapping H atoms.

Example 1:

xyz_1 = """C      -0.75196103   -0.01443262    0.18205588
           N       0.63200970   -0.03460976   -0.25660404
           H      -0.81177190   -0.09860215    1.27086719
           H      -1.23175080    0.91916596   -0.12450387
           H       1.14009235    0.72760640    0.19014495
           H       1.07553732   -0.89544330    0.06113001"""

A

xyz_2 = """C       0.69194930    0.05438938    0.02065423
           H       1.30945080   -0.83093491    0.14456348
           H       1.16491421    1.03039618    0.08526955
           N      -0.72781945   -0.06628299   -0.30657582
           H      -1.28327572    0.73076677    0.00177732
           H      -1.15521915   -0.91833442    0.05431125"""

B

Expected atom map:
[0, 3, (1 or 2), (2 or 1), (4 or 5), (5 or 4)]

Resulting atom map:
[0, 3, 5, 1, 4, 2]

Example 2:

xyz_1 = """N       1.16378795    1.46842703   -0.82620909
           C       0.75492192    0.42940001   -0.18269967
           C      -0.66835457    0.05917401   -0.13490822
           C      -1.06020680   -1.02517494    0.54162130
           H       2.18280085    1.55132949   -0.73741996
           H       1.46479392   -0.22062618    0.35707573
           H      -1.36374229    0.69906451   -0.66578157
           H      -2.11095970   -1.29660899    0.57562763
           H      -0.36304116   -1.66498540    1.07269317"""

image

xyz_2 = """N      -0.82151000   -0.98211000   -0.58727000
           C      -0.60348000    0.16392000    0.30629000
           C       0.85739000    0.41515000    0.58956000
           C       1.39979110    1.37278509    1.35850405
           H      -1.83926000   -1.03148000   -0.69340000
           H      -1.09049000   -0.04790000    1.26633000
           H       1.53896925   -0.26718857    0.08305817
           H       2.47688530    1.45345357    1.47005614
           H       0.79186768    2.10712053    1.87909788"""

image

Expected atom map:
[0, 1, 2, 3, 4, 5, 6, 7, 8]

Resulting atom map:
[0, 1, 2, 3, 4, 7, 6, 5, 8]

To Reproduce

Using QCElemental v0.20.0

from qcelemental.models.molecule import Molecule

qcmol_1 = Molecule.from_data(
            data=xyz_1,
            molecular_charge=0,
            molecular_multiplicity=multiplicity,  # 2 for Ex. 1, 1 for Ex. 2
        )

qcmol_2 = Molecule.from_data(
            data=xyz_2,
            molecular_charge=0,
            molecular_multiplicity=multiplicity,  # 2 for Ex. 1, 1 for Ex. 2
        )

data = qcmol_2.align(ref_mol=qcmol_1, verbose=0)
atom_map = data[1]['mill'].atommap.tolist()

(Edit: updated "Expected atom map" of Example 2, switched the last two indices representing H atoms on the terminal C)

@loriab
Copy link
Collaborator

loriab commented Aug 22, 2021

Thanks for the report -- I'm looking into it. To get past the alignment hurdle, you can pass uno_cutoff=15 or so, so that more combinations are tried. I'm thinking what's happening is that one of the "wrong" hydrogens looks to be a close match relative-distance-wise, so the algorithm is filtering out permutations that don't include that match. As a result, close+far permutations are being tried rather than the ok+ok that would yield the right solution. Still investigating though.

@mefuller
Copy link

Thank you - adding uno_cutoff=15 returns correct maps for both of these test cases, specifically respective maps of [0, 3, 2, 1, 5, 4] and [0, 1, 2, 3, 4, 5, 6, 7, 8]

@alongd
Copy link
Author

alongd commented Aug 24, 2021

Thanks for the suggestion!

I'd like to add another example that is still problematic for this algorithm. I'm aware that no atom-mapping algorithm has a 100% success rate, but I hope through these examples to either learn how to use atom mapping via QCElemental better, or to understand the current limitations.

This example involves two internal rotations, and is not being mapped correctly regardless of the uno_cutoff value.

xyz_1 = """C       0.02571153    1.50024692   -0.01880972
           H      -0.25012379    2.28327632    0.67957788
           H       0.21710650    1.77015012   -1.05186079
           C      -0.12961272    0.05931627    0.38298020
           C      -1.52159692   -0.43413728   -0.00244580
           C       0.95427547   -0.82618224   -0.25128786
           O       2.23864587   -0.52290772    0.28688439
           H      -0.02271951    0.01229964    1.47391586
           H      -1.67349890   -1.46562132    0.33336150
           H      -1.67080846   -0.40804497   -1.08793835
           H      -2.30052614    0.18308086    0.45923715
           H       0.75830763   -1.88272043   -0.04089782
           H       0.99720067   -0.70255870   -1.33919508
           H       2.37763877    0.43380254    0.17647842"""

image

xyz_2 = """C      -1.38578118   -1.38826294   -0.09505563
           C      -0.48149615   -0.18843420   -0.36867730
           C      -1.16150619    1.10470021    0.08616180
           C       0.87558159   -0.37530245    0.31513701
           O       1.74999301    0.68547959   -0.04657791
           H      -1.58243057   -1.50218391    0.97647435
           H      -0.92368296   -2.31348660   -0.45519843
           H      -0.31894490   -0.11004094   -1.45123484
           H      -0.55106964    1.97998644   -0.15857017
           H      -2.13009202    1.22826013   -0.41004219
           H      -1.33376495    1.10294566    1.16788369
           H       0.77092902   -0.38422054    1.40544708
           H       1.33791070   -1.31713213    0.00125620
           H       2.59529287    0.52546183    0.40660187"""

image

Expected atom map:
[0, 6, 5, 1, 2, 3, 4, 7, 8, 10, 9, 11, 12, 13]

Resulting atom map:

[0, 6, 5, 1, 2, 3, 4, *8, *11, 10, *12, *9, *7, 13] using the default uno_cutoff value

[0, 6, 5, 1, 2, 3, 4, *8, *11, 10, *12, *9, *7, 13] using uno_cutoff = 0.01

[0, 6, 5, 1, 2, 3, 4, *8, *11, 10, *12, *9, *7, 13] using uno_cutoff = 0.1

[0, 6, 5, 1, 2, 3, 4, *8, *11, 10, *7, *9, 12, 13] using uno_cutoff = 1

[0, 6, 5, 1, 2, 3, 4, 7, *10, *11, *8, *9, 12, 13] using uno_cutoff = 15

[0, 6, *12, 1, 2, 3, 4, 7, *9, 10, *5, *8, *11, 13] using uno_cutoff = 100

[3, 12, 11, 1, 0, 2, 4, 7, *9, *5, *6, *8, *10, 13] using uno_cutoff = 500

(an * marks an incorrectly mapped atom)

@loriab
Copy link
Collaborator

loriab commented Aug 24, 2021

Excellent, thank you for more examples. These will all be useful.

For background, the aligner is used pretty routinely for mols_align=True (rmsd=0.0) operations. While it was designed around rmsd != 0 and has various tests, I don't know that it's well exercised in that role. Overall, its Achilles Heel (even form rmsd=0.0) is too many identical symmetry equivalent atoms, so keep Buckyballs away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants