-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release sage 2.0.0 torsion #419
base: master
Are you sure you want to change the base?
Conversation
QCSubmit Validation Report
QC Specification Report
QCSubmit version information(click to expand)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Largely LGTM, just a few minor nitpicks!
...024-12-17-OpenFF-Sage-2.0.0-Torsion-Drive-Training-Dataset-v1.0/generate-combined-dataset.py
Outdated
Show resolved
Hide resolved
...024-12-17-OpenFF-Sage-2.0.0-Torsion-Drive-Training-Dataset-v1.0/generate-combined-dataset.py
Outdated
Show resolved
Hide resolved
rec_ids_cmiles = {} | ||
for _, results in Opt.entries.items(): | ||
tmp_rec_ids_cmiles = {result.record_id: result.cmiles for result in results} | ||
# TODO: Check if updating dic would change the number of records |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still todo?
A quantum chemical (QC) dataset curated to train the OpenFF 2.0.0 Sage torsion potentials. This QC dataset with the OpenFF default level of theory, B3LYP-D3BJ/DZVP, is used to benchmark Sage geometries and energetics. These optimized conformer geometries where used to train one dimensional torsional profiles. This Generation 2 dataset increases chemical diversity when compared to Generation 1, which are of value to our industry partners. Large molecules (>20 heavy atoms) were also included, including more flexible molecules and a greater degree of conformational variation which provide intramolecular interactions. This is the complete optimization dataset used for training OpenFF 2.0.0 Sage, consisting of the following datasets: | ||
|
||
'OpenFF Gen 2 Torsion Set 1 Roche', | ||
'OpenFF Gen 2 Torsion Set 2 Coverage', 'OpenFF Gen 2 Torsion Set 3 Pfizer Discrepancy', 'OpenFF Gen 2 Torsion Set 4 eMolecules - Discrepancy', 'OpenFF Gen 2 Torsion Set 5 Bayer' and 'OpenFF Gen 2 Torsion Set 6 supplemental 2'. The `HydrogenBondFilter(method='baker-hubbard')` filter was applied, and the following record IDs were dropped due to issues with ForceBalance: 6098580, 2703504, 2703505, 18045478. Further information can be found in the curation scripts for the linked repositories. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the optimization training set PR you linked the OpenFF Sage repo, as well as the directories of each of the source torsion drive sets. That was quite nice, could you please do that here as well?
submissions/2024-12-17-OpenFF-Sage-2.0.0-Torsion-Drive-Training-Dataset-v1.0/README.md
Show resolved
Hide resolved
submissions/2024-12-17-OpenFF-Sage-2.0.0-Torsion-Drive-Training-Dataset-v1.0/README.md
Show resolved
Hide resolved
{ | ||
"dataset_name": "OpenFF Sage 2.0.0 Torsion Drive Training Dataset v1.0", | ||
"dataset_tagline": "B3LYP-D3BJ/DZVP conformers applicable to drug-like molecules for OpenFF 2.0.0 Sage", | ||
"description": "A quantum chemical (QC) dataset curated to train the OpenFF 2.0.0 Sage torsion potentials. This QC dataset with the OpenFF default level of theory, B3LYP-D3BJ/DZVP, is used to benchmark Sage geometries and energetics. These optimized conformer geometries where used to train one dimensional torsional profiles. This Generation 2 dataset increases chemical diversity when compared to Generation 1, which are of value to our industry partners. Large molecules (>20 heavy atoms) were also included, including more flexible molecules and a greater degree of conformational variation which provide intramolecular interactions. This is the complete optimization dataset used for training OpenFF 2.0.0 Sage, consisting of the following datasets: 'OpenFF Gen 2 Torsion Set 1 Roche', 'OpenFF Gen 2 Torsion Set 2 Coverage', 'OpenFF Gen 2 Torsion Set 3 Pfizer Discrepancy', 'OpenFF Gen 2 Torsion Set 4 eMolecules - Discrepancy', 'OpenFF Gen 2 Torsion Set 5 Bayer' and 'OpenFF Gen 2 Torsion Set 6 supplemental 2'. The `HydrogenBondFilter(method='baker-hubbard')` filter was applied, and the following record IDs were dropped due to issues with ForceBalance: 6098580, 2703504, 2703505, 18045478. Further information can be found in the curation scripts for the linked repositories.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"description": "A quantum chemical (QC) dataset curated to train the OpenFF 2.0.0 Sage torsion potentials. This QC dataset with the OpenFF default level of theory, B3LYP-D3BJ/DZVP, is used to benchmark Sage geometries and energetics. These optimized conformer geometries where used to train one dimensional torsional profiles. This Generation 2 dataset increases chemical diversity when compared to Generation 1, which are of value to our industry partners. Large molecules (>20 heavy atoms) were also included, including more flexible molecules and a greater degree of conformational variation which provide intramolecular interactions. This is the complete optimization dataset used for training OpenFF 2.0.0 Sage, consisting of the following datasets: 'OpenFF Gen 2 Torsion Set 1 Roche', 'OpenFF Gen 2 Torsion Set 2 Coverage', 'OpenFF Gen 2 Torsion Set 3 Pfizer Discrepancy', 'OpenFF Gen 2 Torsion Set 4 eMolecules - Discrepancy', 'OpenFF Gen 2 Torsion Set 5 Bayer' and 'OpenFF Gen 2 Torsion Set 6 supplemental 2'. The `HydrogenBondFilter(method='baker-hubbard')` filter was applied, and the following record IDs were dropped due to issues with ForceBalance: 6098580, 2703504, 2703505, 18045478. Further information can be found in the curation scripts for the linked repositories.", | |
"description": "A quantum chemical (QC) dataset curated to train the OpenFF 2.0.0 Sage torsion potentials. This QC dataset with the OpenFF default level of theory, B3LYP-D3BJ/DZVP, is used to benchmark Sage geometries and energetics. These optimized conformer geometries were used to train one dimensional torsional profiles. This Generation 2 dataset increases chemical diversity when compared to Generation 1, which are of value to our industry partners. Large molecules (>20 heavy atoms) were also included, including more flexible molecules and a greater degree of conformational variation which provide intramolecular interactions. This is the complete TorsionDrive dataset used for training OpenFF 2.0.0 Sage, consisting of the following datasets: 'OpenFF Gen 2 Torsion Set 1 Roche', 'OpenFF Gen 2 Torsion Set 2 Coverage', 'OpenFF Gen 2 Torsion Set 3 Pfizer Discrepancy', 'OpenFF Gen 2 Torsion Set 4 eMolecules - Discrepancy', 'OpenFF Gen 2 Torsion Set 5 Bayer' and 'OpenFF Gen 2 Torsion Set 6 supplemental 2'. The `HydrogenBondFilter(method='baker-hubbard')` filter was applied, and the following record IDs were dropped due to issues with ForceBalance: 6098580, 2703504, 2703505, 18045478. Further information can be found in the curation scripts for the linked repositories.", |
Some typos and suggestions.
QCSubmit Validation Report
QC Specification Report
QCSubmit version information(click to expand)
|
QCSubmit Validation Report
QC Specification Report
QCSubmit version information(click to expand)
|
QCSubmit Validation Report
QC Specification Report
QCSubmit version information(click to expand)
|
QCSubmit Validation Report
QC Specification Report
QCSubmit version information(click to expand)
|
…g-Dataset-v1.0/generate-combined-dataset.py Co-authored-by: Lily Wang <31115101+lilyminium@users.noreply.github.com>
QCSubmit Validation Report
QC Specification Report
QCSubmit version information(click to expand)
|
QCSubmit Validation Report
QC Specification Report
QCSubmit version information(click to expand)
|
New Submission Checklist
README.md
describing the dataset see here for examplesdataset*.json
; may feature a compression extension, such as.bz2
README.md