Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could validate_smirks fall back to an RDKit/OE backend instead of or addition to chemper? #373

Open
lilyminium opened this issue Nov 19, 2024 · 1 comment

Comments

@lilyminium
Copy link

Description

I think I'm running into MobleyLab/chemper#99 or a variant when I try to create the following:

from openff.bespokefit.schema.smirnoff import AngleSMIRKS
AngleSMIRKS(
    smirks="[H][C:2]~1~[C:1](~N(~C(~[N:3]1)[H])[H])C([H])([H])[C]([H])([C]=O)[N][H]",
    attributes={"k", "angle"}
)

raises

Warning: Unable to parse SMARTS
Warning: [H]-,:[C:2]~[C:1](-,:[C](-,:[C](-,:[C]=[O])(-,:[H])-,:[N]-,:[H])(-,:[H])-,:[H])~[N](~[C](-,:[H])~[N:3]1)-,:[H]
Warning:                                                                                                               ^

---------------------------------------------------------------------------
SMIRKSParsingError                        Traceback (most recent call last)
Cell In[2], line 1
----> 1 AngleSMIRKS(
      2     smirks="[H][C:2]~1~[C:1](~N(~C(~[N:3]1)[H])[H])C([H])([H])[C]([H])([C]=O)[N][H]",
      3     attributes={"k", "angle"}
      4 )

File ~/micromamba/envs/openff-param-fit/lib/python3.10/site-packages/pydantic/main.py:339, in pydantic.main.BaseModel.__init__()

File ~/micromamba/envs/openff-param-fit/lib/python3.10/site-packages/pydantic/main.py:1074, in pydantic.main.validate_model()

File ~/micromamba/envs/openff-param-fit/lib/python3.10/site-packages/pydantic/fields.py:895, in pydantic.fields.ModelField.validate()

File ~/micromamba/envs/openff-param-fit/lib/python3.10/site-packages/pydantic/fields.py:1154, in pydantic.fields.ModelField._apply_validators()

File ~/micromamba/envs/openff-param-fit/lib/python3.10/site-packages/pydantic/class_validators.py:304, in pydantic.class_validators._generic_validator_cls.lambda4()

File ~/micromamba/envs/openff-param-fit/lib/python3.10/site-packages/openff/bespokefit/schema/smirnoff.py:64, in BaseSMIRKSParameter._check_smirks(cls, value)
     62 @validator("smirks")
     63 def _check_smirks(cls, value: str) -> str:
---> 64     return validate_smirks(value, cls._expected_n_tags())

File ~/micromamba/envs/openff-param-fit/lib/python3.10/site-packages/openff/bespokefit/schema/smirnoff.py:23, in validate_smirks(smirks, expected_tags)
     18 def validate_smirks(smirks: str, expected_tags: int) -> str:
     19     """
     20     Make sure the supplied smirks has the correct number of tagged atoms.
     21     """
---> 23     smirk = ChemicalEnvironment(smirks=smirks)
     24     tagged_atoms = len(smirk.get_indexed_atoms())
     26     assert tagged_atoms == expected_tags, (
     27         f"The smirks pattern ({smirks}) has {tagged_atoms} tagged atoms, but should "
     28         f"have {expected_tags}."
     29     )

File ~/micromamba/envs/openff-param-fit/lib/python3.10/site-packages/chemper/graphs/environment.py:561, in ChemicalEnvironment.__init__(self, smirks, label, replacements)
    559 # Check that the created Environment is valid
    560 if not self.is_valid():
--> 561     raise SMIRKSParsingError("Input SMIRKS (%s), converted to %s \
    562             is now invalid" % (smirks, self.as_smirks()))
    564 return

SMIRKSParsingError: (SMIRKSParsingError(...), 'Input SMIRKS ([H][C:2]~1~[C:1](~N(~C(~[N:3]1)[H])[H])C([H])([H])[C]([H])([C]=O)[N][H]), converted to [C:1](~[C:2]-,:[H])(-,:[C](-,:[C](-,:[C]=[O])(-,:[H])-,:[N]-,:[H])(-,:[H])-,:[H])~[N](~[C](-,:[H])~[N:3]1)-,:[H]                     is now invalid')

Since the validation just checks that the correct number of atoms is tagged, would it be possible to substitute that with a fuller cheminformatics toolkit backend or even a regex?

@jthorton
Copy link
Contributor

In the past we have been hesitant to do the regex ourselves but changing the validator to use rdkit or openeye is a great idea and should fix this issue going forward, would you like to make a PR for the change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants