Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release sage 2.0.0 #418

Merged
merged 11 commits into from
Dec 18, 2024
Merged
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@ Datasets must be submitted as pull requests.
conda env create -f qca-dataset-submission/devtools/prod-envs/qcarchive-user-submit.yaml
conda activate qcarchive-user-submit
```
You may also need to install OpenEye:\
`conda install -c openeye openeye-toolkits`



4. Choose a starting notebook and README based on the type of dataset you wish to submit:
Expand Down Expand Up @@ -202,6 +205,10 @@ The status only refers to the `default` specification which is required for all

[![Running](https://img.shields.io/badge/Status-Running-orange)](https://img.shields.io/badge/Status-Running-orange) the dataset is currently running and may have some incomplete jobs.

# Forcefield Release Datasets
| Forcefield | Repo | Optimization | Torsion Drive | Elements | Zenodo |
|-------------|----------|-------------------|--------------------|----------|--------|
| Release OpenFF 2.0.0 Sage | [openff-sage](https://github.com/openforcefield/openff-sage) | [2024-12-12-OpenFF-Sage-2.0.0-Training-Optimization-Dataset-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-12-12-OpenFF-Sage-2.0.0-Training-Optimization-Dataset-v1.0) | [Coming Soon]() | H, C, N, O, S, P, F, Cl, Br, I | [Coming Soon]() |


# Basic Datasets
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# OpenFF Sage 2.0.0 Training Optimization v1.0

### Description

A quantum chemical (QC) dataset curated to train [OpenFF 2.0.0 Sage](https://github.com/openforcefield/openff-sage) forcefield, with reparametrized Lennard-Jones (LJ) and valence parameters, the latter relevant to this dataset. This QC dataset with the OpenFF default level of theory, B3LYP-D3BJ/DZVP, is used to benchmark Sage geometries and energetics. These optimized conformer geometries where used in conjunction with the QC dataset used to train one dimensional torsional profiles. This Generation 2 dataset increases chemical diversity when compared to Generation 1, which are of value to our industry partners. Large molecules (>20 heavy atoms) were also included, including more flexible molecules and a greater degree of conformational variation which provide intramolecular interactions.

This is the complete optimization dataset used for training OpenFF 2.0.0 Sage, consisting of the following datasets:

- [OpenFF Gen 2 Opt Set 1 Roche](https://github.com/openforcefield/qca-dataset-submission/tree/0e6e6da930118e2a2d6402b93c3e3e93830600cc/submissions/2020-03-20-OpenFF-Gen-2-Optimization-Set-1-Roche)
- [OpenFF Gen 2 Opt Set 2 Coverage](https://github.com/openforcefield/qca-dataset-submission/tree/0e6e6da930118e2a2d6402b93c3e3e93830600cc/submissions/2020-03-20-OpenFF-Gen-2-Optimization-Set-2-Coverage)
- [OpenFF Gen 2 Opt Set 3 Pfizer Discrepancy](https://github.com/openforcefield/qca-dataset-submission/tree/0e6e6da930118e2a2d6402b93c3e3e93830600cc/submissions/2020-03-20-OpenFF-Gen-2-Optimization-Set-3-Pfizer-Discrepancy)
- [OpenFF Gen 2 Opt Set 4 eMolecules - Discrepancy](https://github.com/openforcefield/qca-dataset-submission/tree/0e6e6da930118e2a2d6402b93c3e3e93830600cc/submissions/2020-03-20-OpenFF-Gen-2-Optimization-Set-4-eMolecules-Discrepancy)
- [OpenFF Gen 2 Opt Set 5 Bayer](https://github.com/openforcefield/qca-dataset-submission/tree/0e6e6da930118e2a2d6402b93c3e3e93830600cc/submissions/2020-03-20-OpenFF-Gen-2-Optimization-Set-5-Bayer)

The following filters were applied to those datasets:

- `RecordStatusFilter(status=RecordStatusEnum.complete)`
- `ConnectivityFilter(tolerance=1.2)`
- `UndefinedStereoFilter()`
- `ElementFilter(allowed_elements=["H", "C", "N", "O", "S", "P", "F", "Cl", "Br", "I"])`
- `ConformerRMSDFilter(max_conformers=10)`

Further information can be found in the curation scripts for the linked repositories.

### General Information

- Date: 2024-12-12
- Class: OpenFF Optimization Dataset
- Purpose: Complete set of training data for OpenFF 2.0.0 Sage
- Dataset Type: optimization
- Name: OpenFF Sage 2.0.0 Training Optimization Dataset v1.0
- Number of unique molecules: 1025
- Number of filtered molecules: 0
- Number of conformers: 3663
- Number of conformers (min mean max): 1.00, 3.53, 10.00
- Mean molecular weight: 261.38
- Max molecular weight: 544.64
- Set of charges: -2.0, -1.0, 0.0, 1.0
- Dataset Submitter: Jennifer A. Clark
- Dataset Curator: Simon Boothroyd
- Dataset Generator: Hyesu Jang

### QCSubmit generation pipeline

- `generate-combined-dataset.py`: A python script which shows how the dataset was prepared from the input files.
- `output.txt`: A text file containing the printed output of `generate-combined-dataset.py`.

### QCSubmit Manifest

- `generate-combined-dataset.py`
- `dataset.json.bz2`: The basic dataset ready for submission.
- `dataset.pdf`: A pdf file containing molecule 2D structures.
- `dataset.smi`: SMILES for every molecule in the submission.

### Metadata

* Elements: {F, I, N, C, P, Cl, S, Br, O, H}
* QC Specifications: default
* basis: DZVP
* implicit_solvent: None
* keywords: {}
* maxiter: 200
* method: B3LYP-D3BJ
* program: psi4
* SCF Properties:
* dipole
* quadrupole
* wiberg_lowdin_indices
* mayer_indices
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
name: qcarchive-user-submit
channels:
- conda-forge
- openeye
dependencies:
- annotated-types=0.7.0=pyhd8ed1ab_1
- apsw=3.47.0.0=py311hde754ab_0
- argcomplete=3.5.2=pyhd8ed1ab_0
- attrs=24.2.0=pyh71513ae_1
- basis_set_exchange=0.10=pyhd8ed1ab_1
- brotli=1.1.0=hd74edd7_2
- brotli-bin=1.1.0=hd74edd7_2
- brotli-python=1.1.0=py311h3f08180_2
- bson=0.5.9=py_0
- bzip2=1.0.8=h99b78c6_7
- ca-certificates=2024.8.30=hf0a4a13_0
- cached-property=1.5.2=hd8ed1ab_1
- cached_property=1.5.2=pyha770c72_1
- cachetools=5.5.0=pyhd8ed1ab_1
- cairo=1.18.2=h6a3b0d2_1
- certifi=2024.8.30=pyhd8ed1ab_0
- cffi=1.17.1=py311h3a79f62_0
- chardet=5.2.0=py311h267d04e_2
- charset-normalizer=3.4.0=pyhd8ed1ab_1
- colorama=0.4.6=pyhd8ed1ab_1
- contourpy=1.3.1=py311h210dab8_0
- cycler=0.12.1=pyhd8ed1ab_1
- dill=0.3.9=pyhd8ed1ab_1
- exceptiongroup=1.2.2=pyhd8ed1ab_1
- font-ttf-dejavu-sans-mono=2.37=hab24e00_0
- font-ttf-inconsolata=3.000=h77eed37_0
- font-ttf-source-code-pro=2.038=h77eed37_0
- font-ttf-ubuntu=0.83=h77eed37_3
- fontconfig=2.15.0=h1383a14_1
- fonts-conda-ecosystem=1=0
- fonts-conda-forge=1=0
- fonttools=4.55.3=py311h4921393_0
- freetype=2.12.1=hadb7bae_2
- freetype-py=2.3.0=pyhd8ed1ab_0
- greenlet=3.1.1=py311h3f08180_0
- h2=4.1.0=pyhd8ed1ab_1
- hpack=4.0.0=pyhd8ed1ab_1
- hyperframe=6.0.1=pyhd8ed1ab_1
- icu=75.1=hfee45f7_0
- idna=3.10=pyhd8ed1ab_1
- importlib-metadata=8.5.0=pyha770c72_1
- importlib_resources=6.4.5=pyhd8ed1ab_1
- iniconfig=2.0.0=pyhd8ed1ab_1
- jsonschema=4.23.0=pyhd8ed1ab_1
- jsonschema-specifications=2024.10.1=pyhd8ed1ab_1
- kiwisolver=1.4.7=py311h2c37856_0
- krb5=1.21.3=h237132a_0
- lcms2=2.16=ha0e7c42_0
- lerc=4.0.0=h9a09cb3_0
- libblas=3.9.0=25_osxarm64_openblas
- libboost=1.84.0=hc9fb7c5_7
- libboost-python=1.84.0=py311h8fc16d6_7
- libbrotlicommon=1.1.0=hd74edd7_2
- libbrotlidec=1.1.0=hd74edd7_2
- libbrotlienc=1.1.0=hd74edd7_2
- libcblas=3.9.0=25_osxarm64_openblas
- libcxx=19.1.5=ha82da77_0
- libdeflate=1.22=hd74edd7_0
- libedit=3.1.20191231=hc8eb9b7_2
- libexpat=2.6.4=h286801f_0
- libffi=3.4.2=h3422bc3_5
- libgfortran=5.0.0=13_2_0_hd922786_3
- libgfortran5=13.2.0=hf226fd6_3
- libglib=2.82.2=h07bd6cf_0
- libiconv=1.17=h0d3ecfb_2
- libintl=0.22.5=h8414b35_3
- libjpeg-turbo=3.0.0=hb547adb_1
- liblapack=3.9.0=25_osxarm64_openblas
- liblzma=5.6.3=h39f12f2_1
- libopenblas=0.3.28=openmp_hf332438_1
- libpng=1.6.44=hc14010f_0
- libpq=16.6=hb008251_1
- librdkit=2024.03.5=h54a62e4_3
- libsqlite=3.47.0=hbaaea75_1
- libtiff=4.7.0=ha962b0a_2
- libwebp-base=1.4.0=h93a5062_0
- libxcb=1.17.0=hdb1d25a_0
- libzlib=1.3.1=h8359307_2
- llvm-openmp=19.1.5=hdb05f8b_0
- matplotlib-base=3.9.3=py311h031da69_0
- msgpack-python=1.1.0=py311h2c37856_0
- multiprocess=0.70.17=py311h917b07b_1
- munkres=1.1.4=pyh9f0ad1d_0
- ncurses=6.5=h7bae524_1
- networkx=3.4.2=pyh267e887_2
- numpy=1.26.4=py311h7125741_0
- openeye-toolkits=2024.2.0=py311_0
- openff-amber-ff-ports=0.0.4=pyhca7485f_0
- openff-forcefields=2024.09.0=pyhff2d567_0
- openff-qcsubmit=0.54.0=pyhd8ed1ab_0
- openff-toolkit-base=0.16.7=pyhd8ed1ab_0
- openff-units=0.2.2=pyhca7485f_0
- openff-utilities=0.1.13=pyhd8ed1ab_0
- openjpeg=2.5.3=h8a3d83b_0
- openssl=3.4.0=h39f12f2_0
- packaging=24.2=pyhd8ed1ab_2
- pandas=2.2.2=py311h4b4568b_1
- pcre2=10.44=h297a79d_2
- pillow=11.0.0=py311h3894ae9_0
- pint=0.23=pyhd8ed1ab_1
- pip=24.3.1=pyh8b19718_0
- pixman=0.44.2=h2f9eb0b_0
- pkgutil-resolve-name=1.3.10=pyhd8ed1ab_2
- pluggy=1.5.0=pyhd8ed1ab_1
- pthread-stubs=0.4=hd74edd7_1002
- pycairo=1.27.0=py311h84a5a08_0
- pycalverter=1.6.1=pyhd8ed1ab_1
- pycparser=2.22=pyh29332c3_1
- pydantic=2.10.3=pyh3cfb1c2_0
- pydantic-core=2.27.1=py311h3ff9189_0
- pyjwt=2.10.1=pyhd8ed1ab_0
- pyparsing=3.2.0=pyhd8ed1ab_2
- pysocks=1.7.1=pyha55dd90_7
- pytest=8.3.4=pyhd8ed1ab_1
- python=3.11.11=hc22306f_1_cpython
- python-constraint=1.4.0=py_0
- python-dateutil=2.9.0.post0=pyhff2d567_1
- python-tzdata=2024.2=pyhd8ed1ab_1
- python_abi=3.11=5_cp311
- pytz=2024.2=pyhd8ed1ab_1
- pyyaml=6.0.2=py311h460d6c5_1
- qcelemental=0.28.0=pyhd8ed1ab_1
- qcportal=0.56=pyhd8ed1ab_1
- qhull=2020.2=h420ef59_5
- rdkit=2024.03.5=py311h8a4e316_3
- readline=8.2=h92ec313_1
- referencing=0.35.1=pyhd8ed1ab_1
- regex=2024.11.6=py311h917b07b_0
- reportlab=4.2.5=py311h460d6c5_0
- requests=2.32.3=pyhd8ed1ab_1
- rlpycairo=0.2.0=pyhd8ed1ab_0
- rpds-py=0.22.3=py311h3ff9189_0
- setuptools=75.6.0=pyhff2d567_1
- six=1.17.0=pyhd8ed1ab_0
- smirnoff99frosst=1.1.0=pyh44b312d_0
- sqlalchemy=2.0.36=py311hae2e1ce_0
- sqlite=3.47.0=hcd14bea_1
- tabulate=0.9.0=pyhd8ed1ab_2
- tk=8.6.13=h5083fa2_1
- tomli=2.2.1=pyhd8ed1ab_1
- tqdm=4.67.1=pyhd8ed1ab_0
- typing-extensions=4.12.2=hd8ed1ab_1
- typing_extensions=4.12.2=pyha770c72_1
- tzdata=2024b=hc8b5060_0
- unicodedata2=15.1.0=py311hae2e1ce_1
- unidecode=1.3.8=pyh29332c3_1
- urllib3=2.2.3=pyhd8ed1ab_1
- wheel=0.45.1=pyhd8ed1ab_1
- xmltodict=0.14.2=pyhd8ed1ab_1
- xorg-libxau=1.0.11=hd74edd7_1
- xorg-libxdmcp=1.1.5=hd74edd7_0
- yaml=0.2.5=h3422bc3_2
- zipp=3.21.0=pyhd8ed1ab_1
- zstandard=0.23.0=py311ha60cc69_1
- zstd=1.5.6=hb46c0d2_0

Git LFS file not shown
Binary file not shown.
Loading
Loading