-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add gdb format molecule reading #291
base: master
Are you sure you want to change the base?
Conversation
For completeness adding my commends from Slack here... We've lost the desired high-level API from my implementation in #288. i.e., One can test the implementation against the whole dataset using the following from qcelemental.models import Molecule
from qcelemental.exceptions import MoleculeFormatError
from pathlib import Path
if __name__ == "__main__":
import sys
path = Path(sys.argv[1])
failures = []
for i, p in enumerate(path.iterdir()):
full_path = p.resolve()
try:
Molecule.from_file(full_path)
except MoleculeFormatError as e:
print(full_path.name)
failures.append(full_path.name)
if i % 1000 == 0:
print(i)
print(failures)
print(f"Total Failures: {len(failures)}") The changed test implementation from unprocessed, processed = _filter_xyz(string, strict=True) to final = qcelemental.molparse.from_string(string, return_processed=False, dtype="gdb") is what makes this PR still "pass" the tests I wrote, but we've lost the |
Sorry, saw this after Slack, so I'll repeat here :-) A near-high-level API should work now as For anyone following along, the key difference is that this PR parses gdb as a separate dtype, whereas #288 parses gdb under "xyz" dtype with some regex relaxations. Maybe that's ok, as gdb is a correct superset of xyz, but I do worry about less guidance/errors being returned to the user. e.g., the below could pass, when it probably wasn't the user's intended geometry.
|
Cool! Thanks for the update :) I worry about the alternative case, i.e., end users see all the Is there a reason you prefer requiring the extra |
Also, I still see many more failures with the current code. Better than before, but I get 613 failures on the |
Ideal scenario for this PR:
Can you help me to understand this scenario you are concerned about?
Would this be a format we expect users to encounter in regular use or more a hypothetical that concerns you? Thanks for your time on this. I'm happy to help finish the implementation if you can point out the concerns you have with #288 that may have undesired behavior. I found the |
This pull request introduces 1 alert and fixes 1 when merging 508817f into cb04079 - view on LGTM.com new alerts:
fixed alerts:
|
This pull request introduces 1 alert and fixes 1 when merging 829dd44 into cb04079 - view on LGTM.com new alerts:
fixed alerts:
|
See description and purpose and proposed tests at #288. This is a separate implementation of the parsing.
Status