Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve SMILES translation for surface adsorbates #2701

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Commits on Aug 2, 2024

  1. Conversion to RDKitMol can use 0 for surface sites X

    You can use a * to represent a surface site in a SMILES string when
    reading it via RDKit, and RDKit turns this into a dummy atom
    with atomic number 0.
    
    By doing the reverse (telling RDKit that our surface sites have
    atomic number 0) we can use RDKit to *generate* SMILES strings
    in the same format, enabling a round trip.
    
    But for other things like InChIs it seems more robust to use
    an atom like Platinum (78). This allows both, with default 0,
    but 78 used in InChI conversion.
    rwest committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    2d9a7f3 View commit details
    Browse the repository at this point in the history
  2. Unit test for conversion to/from RDKit with some adsorbate SMILES.

    Using the new syntax with * for a surface site.
    rwest committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    8ef7aca View commit details
    Browse the repository at this point in the history
  3. New unit tests for SMILES conversions of Adsorbates.

    Unfortunately going from a molecule TO a smiles uses
    OpenBabel if you have Nitrogen in the molecule, which 
    then uses [Pt] in place of *.
    But you can still READ smiles with * and N in. 
    That means you don't get a round trip. 
    In [9]: Molecule(smiles='CNC*').to_smiles()
    Out[9]: 'CNC[Pt]'
    
    Still, this is better than it was. (I think).
    rwest committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    2051398 View commit details
    Browse the repository at this point in the history
  4. SMILES from OpenBabel now replace [Pt] with *

    This means they can be parsed in a round trip by RDKit
    (the default SMILES reader). 
    This is handy because OpenBabel is the default SMILES
    *writer* for things with an N atom, but not everything.
    Now it's more consistent, outputting a * for a surface site.
    
    I added a unit test for round-trip conversion to and from 
    SMILES a few times for various adsorbates including some with N.
    rwest committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    977daff View commit details
    Browse the repository at this point in the history