-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Initial files * Generate dataset descriptive files * Update method to use requests to pull relevant file from github * Add openeye installation to instructions * Address dataset formatting feedback * Added forcefield table to repo README
- Loading branch information
Showing
8 changed files
with
1,544 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
69 changes: 69 additions & 0 deletions
69
...sions/2024-12-12-OpenFF-Sage-2.0.0-Training-Optimization-Dataset-v1.0/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# OpenFF Sage 2.0.0 Training Optimization v1.0 | ||
|
||
### Description | ||
|
||
A quantum chemical (QC) dataset curated to train [OpenFF 2.0.0 Sage](https://github.com/openforcefield/openff-sage) forcefield, with reparametrized Lennard-Jones (LJ) and valence parameters, the latter relevant to this dataset. This QC dataset with the OpenFF default level of theory, B3LYP-D3BJ/DZVP, is used to benchmark Sage geometries and energetics. These optimized conformer geometries where used in conjunction with the QC dataset used to train one dimensional torsional profiles. This Generation 2 dataset increases chemical diversity when compared to Generation 1, which are of value to our industry partners. Large molecules (>20 heavy atoms) were also included, including more flexible molecules and a greater degree of conformational variation which provide intramolecular interactions. | ||
|
||
This is the complete optimization dataset used for training OpenFF 2.0.0 Sage, consisting of the following datasets: | ||
|
||
- [OpenFF Gen 2 Opt Set 1 Roche](https://github.com/openforcefield/qca-dataset-submission/tree/0e6e6da930118e2a2d6402b93c3e3e93830600cc/submissions/2020-03-20-OpenFF-Gen-2-Optimization-Set-1-Roche) | ||
- [OpenFF Gen 2 Opt Set 2 Coverage](https://github.com/openforcefield/qca-dataset-submission/tree/0e6e6da930118e2a2d6402b93c3e3e93830600cc/submissions/2020-03-20-OpenFF-Gen-2-Optimization-Set-2-Coverage) | ||
- [OpenFF Gen 2 Opt Set 3 Pfizer Discrepancy](https://github.com/openforcefield/qca-dataset-submission/tree/0e6e6da930118e2a2d6402b93c3e3e93830600cc/submissions/2020-03-20-OpenFF-Gen-2-Optimization-Set-3-Pfizer-Discrepancy) | ||
- [OpenFF Gen 2 Opt Set 4 eMolecules - Discrepancy](https://github.com/openforcefield/qca-dataset-submission/tree/0e6e6da930118e2a2d6402b93c3e3e93830600cc/submissions/2020-03-20-OpenFF-Gen-2-Optimization-Set-4-eMolecules-Discrepancy) | ||
- [OpenFF Gen 2 Opt Set 5 Bayer](https://github.com/openforcefield/qca-dataset-submission/tree/0e6e6da930118e2a2d6402b93c3e3e93830600cc/submissions/2020-03-20-OpenFF-Gen-2-Optimization-Set-5-Bayer) | ||
|
||
The following filters were applied to those datasets: | ||
|
||
- `RecordStatusFilter(status=RecordStatusEnum.complete)` | ||
- `ConnectivityFilter(tolerance=1.2)` | ||
- `UndefinedStereoFilter()` | ||
- `ElementFilter(allowed_elements=["H", "C", "N", "O", "S", "P", "F", "Cl", "Br", "I"])` | ||
- `ConformerRMSDFilter(max_conformers=10)` | ||
|
||
Further information can be found in the curation scripts for the linked repositories. | ||
|
||
### General Information | ||
|
||
- Date: 2024-12-12 | ||
- Class: OpenFF Optimization Dataset | ||
- Purpose: Complete set of training data for OpenFF 2.0.0 Sage | ||
- Dataset Type: optimization | ||
- Name: OpenFF Sage 2.0.0 Training Optimization Dataset v1.0 | ||
- Number of unique molecules: 1025 | ||
- Number of filtered molecules: 0 | ||
- Number of conformers: 3663 | ||
- Number of conformers (min mean max): 1.00, 3.53, 10.00 | ||
- Mean molecular weight: 261.38 | ||
- Max molecular weight: 544.64 | ||
- Set of charges: -2.0, -1.0, 0.0, 1.0 | ||
- Dataset Submitter: Jennifer A. Clark | ||
- Dataset Curator: Simon Boothroyd | ||
- Dataset Generator: Hyesu Jang | ||
|
||
### QCSubmit generation pipeline | ||
|
||
- `generate-combined-dataset.py`: A python script which shows how the dataset was prepared from the input files. | ||
- `output.txt`: A text file containing the printed output of `generate-combined-dataset.py`. | ||
|
||
### QCSubmit Manifest | ||
|
||
- `generate-combined-dataset.py` | ||
- `dataset.json.bz2`: The basic dataset ready for submission. | ||
- `dataset.pdf`: A pdf file containing molecule 2D structures. | ||
- `dataset.smi`: SMILES for every molecule in the submission. | ||
|
||
### Metadata | ||
|
||
* Elements: {F, I, N, C, P, Cl, S, Br, O, H} | ||
* QC Specifications: default | ||
* basis: DZVP | ||
* implicit_solvent: None | ||
* keywords: {} | ||
* maxiter: 200 | ||
* method: B3LYP-D3BJ | ||
* program: psi4 | ||
* SCF Properties: | ||
* dipole | ||
* quadrupole | ||
* wiberg_lowdin_indices | ||
* mayer_indices |
161 changes: 161 additions & 0 deletions
161
submissions/2024-12-12-OpenFF-Sage-2.0.0-Training-Optimization-Dataset-v1.0/conda_env.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,161 @@ | ||
name: qcarchive-user-submit | ||
channels: | ||
- conda-forge | ||
- openeye | ||
dependencies: | ||
- annotated-types=0.7.0=pyhd8ed1ab_1 | ||
- apsw=3.47.0.0=py311hde754ab_0 | ||
- argcomplete=3.5.2=pyhd8ed1ab_0 | ||
- attrs=24.2.0=pyh71513ae_1 | ||
- basis_set_exchange=0.10=pyhd8ed1ab_1 | ||
- brotli=1.1.0=hd74edd7_2 | ||
- brotli-bin=1.1.0=hd74edd7_2 | ||
- brotli-python=1.1.0=py311h3f08180_2 | ||
- bson=0.5.9=py_0 | ||
- bzip2=1.0.8=h99b78c6_7 | ||
- ca-certificates=2024.8.30=hf0a4a13_0 | ||
- cached-property=1.5.2=hd8ed1ab_1 | ||
- cached_property=1.5.2=pyha770c72_1 | ||
- cachetools=5.5.0=pyhd8ed1ab_1 | ||
- cairo=1.18.2=h6a3b0d2_1 | ||
- certifi=2024.8.30=pyhd8ed1ab_0 | ||
- cffi=1.17.1=py311h3a79f62_0 | ||
- chardet=5.2.0=py311h267d04e_2 | ||
- charset-normalizer=3.4.0=pyhd8ed1ab_1 | ||
- colorama=0.4.6=pyhd8ed1ab_1 | ||
- contourpy=1.3.1=py311h210dab8_0 | ||
- cycler=0.12.1=pyhd8ed1ab_1 | ||
- dill=0.3.9=pyhd8ed1ab_1 | ||
- exceptiongroup=1.2.2=pyhd8ed1ab_1 | ||
- font-ttf-dejavu-sans-mono=2.37=hab24e00_0 | ||
- font-ttf-inconsolata=3.000=h77eed37_0 | ||
- font-ttf-source-code-pro=2.038=h77eed37_0 | ||
- font-ttf-ubuntu=0.83=h77eed37_3 | ||
- fontconfig=2.15.0=h1383a14_1 | ||
- fonts-conda-ecosystem=1=0 | ||
- fonts-conda-forge=1=0 | ||
- fonttools=4.55.3=py311h4921393_0 | ||
- freetype=2.12.1=hadb7bae_2 | ||
- freetype-py=2.3.0=pyhd8ed1ab_0 | ||
- greenlet=3.1.1=py311h3f08180_0 | ||
- h2=4.1.0=pyhd8ed1ab_1 | ||
- hpack=4.0.0=pyhd8ed1ab_1 | ||
- hyperframe=6.0.1=pyhd8ed1ab_1 | ||
- icu=75.1=hfee45f7_0 | ||
- idna=3.10=pyhd8ed1ab_1 | ||
- importlib-metadata=8.5.0=pyha770c72_1 | ||
- importlib_resources=6.4.5=pyhd8ed1ab_1 | ||
- iniconfig=2.0.0=pyhd8ed1ab_1 | ||
- jsonschema=4.23.0=pyhd8ed1ab_1 | ||
- jsonschema-specifications=2024.10.1=pyhd8ed1ab_1 | ||
- kiwisolver=1.4.7=py311h2c37856_0 | ||
- krb5=1.21.3=h237132a_0 | ||
- lcms2=2.16=ha0e7c42_0 | ||
- lerc=4.0.0=h9a09cb3_0 | ||
- libblas=3.9.0=25_osxarm64_openblas | ||
- libboost=1.84.0=hc9fb7c5_7 | ||
- libboost-python=1.84.0=py311h8fc16d6_7 | ||
- libbrotlicommon=1.1.0=hd74edd7_2 | ||
- libbrotlidec=1.1.0=hd74edd7_2 | ||
- libbrotlienc=1.1.0=hd74edd7_2 | ||
- libcblas=3.9.0=25_osxarm64_openblas | ||
- libcxx=19.1.5=ha82da77_0 | ||
- libdeflate=1.22=hd74edd7_0 | ||
- libedit=3.1.20191231=hc8eb9b7_2 | ||
- libexpat=2.6.4=h286801f_0 | ||
- libffi=3.4.2=h3422bc3_5 | ||
- libgfortran=5.0.0=13_2_0_hd922786_3 | ||
- libgfortran5=13.2.0=hf226fd6_3 | ||
- libglib=2.82.2=h07bd6cf_0 | ||
- libiconv=1.17=h0d3ecfb_2 | ||
- libintl=0.22.5=h8414b35_3 | ||
- libjpeg-turbo=3.0.0=hb547adb_1 | ||
- liblapack=3.9.0=25_osxarm64_openblas | ||
- liblzma=5.6.3=h39f12f2_1 | ||
- libopenblas=0.3.28=openmp_hf332438_1 | ||
- libpng=1.6.44=hc14010f_0 | ||
- libpq=16.6=hb008251_1 | ||
- librdkit=2024.03.5=h54a62e4_3 | ||
- libsqlite=3.47.0=hbaaea75_1 | ||
- libtiff=4.7.0=ha962b0a_2 | ||
- libwebp-base=1.4.0=h93a5062_0 | ||
- libxcb=1.17.0=hdb1d25a_0 | ||
- libzlib=1.3.1=h8359307_2 | ||
- llvm-openmp=19.1.5=hdb05f8b_0 | ||
- matplotlib-base=3.9.3=py311h031da69_0 | ||
- msgpack-python=1.1.0=py311h2c37856_0 | ||
- multiprocess=0.70.17=py311h917b07b_1 | ||
- munkres=1.1.4=pyh9f0ad1d_0 | ||
- ncurses=6.5=h7bae524_1 | ||
- networkx=3.4.2=pyh267e887_2 | ||
- numpy=1.26.4=py311h7125741_0 | ||
- openeye-toolkits=2024.2.0=py311_0 | ||
- openff-amber-ff-ports=0.0.4=pyhca7485f_0 | ||
- openff-forcefields=2024.09.0=pyhff2d567_0 | ||
- openff-qcsubmit=0.54.0=pyhd8ed1ab_0 | ||
- openff-toolkit-base=0.16.7=pyhd8ed1ab_0 | ||
- openff-units=0.2.2=pyhca7485f_0 | ||
- openff-utilities=0.1.13=pyhd8ed1ab_0 | ||
- openjpeg=2.5.3=h8a3d83b_0 | ||
- openssl=3.4.0=h39f12f2_0 | ||
- packaging=24.2=pyhd8ed1ab_2 | ||
- pandas=2.2.2=py311h4b4568b_1 | ||
- pcre2=10.44=h297a79d_2 | ||
- pillow=11.0.0=py311h3894ae9_0 | ||
- pint=0.23=pyhd8ed1ab_1 | ||
- pip=24.3.1=pyh8b19718_0 | ||
- pixman=0.44.2=h2f9eb0b_0 | ||
- pkgutil-resolve-name=1.3.10=pyhd8ed1ab_2 | ||
- pluggy=1.5.0=pyhd8ed1ab_1 | ||
- pthread-stubs=0.4=hd74edd7_1002 | ||
- pycairo=1.27.0=py311h84a5a08_0 | ||
- pycalverter=1.6.1=pyhd8ed1ab_1 | ||
- pycparser=2.22=pyh29332c3_1 | ||
- pydantic=2.10.3=pyh3cfb1c2_0 | ||
- pydantic-core=2.27.1=py311h3ff9189_0 | ||
- pyjwt=2.10.1=pyhd8ed1ab_0 | ||
- pyparsing=3.2.0=pyhd8ed1ab_2 | ||
- pysocks=1.7.1=pyha55dd90_7 | ||
- pytest=8.3.4=pyhd8ed1ab_1 | ||
- python=3.11.11=hc22306f_1_cpython | ||
- python-constraint=1.4.0=py_0 | ||
- python-dateutil=2.9.0.post0=pyhff2d567_1 | ||
- python-tzdata=2024.2=pyhd8ed1ab_1 | ||
- python_abi=3.11=5_cp311 | ||
- pytz=2024.2=pyhd8ed1ab_1 | ||
- pyyaml=6.0.2=py311h460d6c5_1 | ||
- qcelemental=0.28.0=pyhd8ed1ab_1 | ||
- qcportal=0.56=pyhd8ed1ab_1 | ||
- qhull=2020.2=h420ef59_5 | ||
- rdkit=2024.03.5=py311h8a4e316_3 | ||
- readline=8.2=h92ec313_1 | ||
- referencing=0.35.1=pyhd8ed1ab_1 | ||
- regex=2024.11.6=py311h917b07b_0 | ||
- reportlab=4.2.5=py311h460d6c5_0 | ||
- requests=2.32.3=pyhd8ed1ab_1 | ||
- rlpycairo=0.2.0=pyhd8ed1ab_0 | ||
- rpds-py=0.22.3=py311h3ff9189_0 | ||
- setuptools=75.6.0=pyhff2d567_1 | ||
- six=1.17.0=pyhd8ed1ab_0 | ||
- smirnoff99frosst=1.1.0=pyh44b312d_0 | ||
- sqlalchemy=2.0.36=py311hae2e1ce_0 | ||
- sqlite=3.47.0=hcd14bea_1 | ||
- tabulate=0.9.0=pyhd8ed1ab_2 | ||
- tk=8.6.13=h5083fa2_1 | ||
- tomli=2.2.1=pyhd8ed1ab_1 | ||
- tqdm=4.67.1=pyhd8ed1ab_0 | ||
- typing-extensions=4.12.2=hd8ed1ab_1 | ||
- typing_extensions=4.12.2=pyha770c72_1 | ||
- tzdata=2024b=hc8b5060_0 | ||
- unicodedata2=15.1.0=py311hae2e1ce_1 | ||
- unidecode=1.3.8=pyh29332c3_1 | ||
- urllib3=2.2.3=pyhd8ed1ab_1 | ||
- wheel=0.45.1=pyhd8ed1ab_1 | ||
- xmltodict=0.14.2=pyhd8ed1ab_1 | ||
- xorg-libxau=1.0.11=hd74edd7_1 | ||
- xorg-libxdmcp=1.1.5=hd74edd7_0 | ||
- yaml=0.2.5=h3422bc3_2 | ||
- zipp=3.21.0=pyhd8ed1ab_1 | ||
- zstandard=0.23.0=py311ha60cc69_1 | ||
- zstd=1.5.6=hb46c0d2_0 | ||
|
3 changes: 3 additions & 0 deletions
3
submissions/2024-12-12-OpenFF-Sage-2.0.0-Training-Optimization-Dataset-v1.0/dataset.json.bz2
Git LFS file not shown
Binary file added
BIN
+593 KB
submissions/2024-12-12-OpenFF-Sage-2.0.0-Training-Optimization-Dataset-v1.0/dataset.pdf
Binary file not shown.
Oops, something went wrong.