openforcefield · jaclark5 · Dec 18, 2024 · Dec 12, 2024 · Dec 13, 2024 · Dec 13, 2024
diff --git a/README.md b/README.md
@@ -298,6 +298,7 @@ These are currently used to find a minimum energy conformation of a molecule.
 | `OpenFF NAGL2 Training Optimization Dataset Part 1 v4.0` | [2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-1-v4.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-1-v4.0) | Optimization dataset for NAGL2 training, part 1 | Cl, O, C, P, I, Br, B, S, N, F, H, Si | |
 | `OpenFF NAGL2 Training Optimization Dataset Part 2 v4.0` | [2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-2-v4.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-2-v4.0) | Optimization dataset for NAGL2 training, part 2 | Si, B, O, I, S, Cl, N, H, C, P, F, Br | |
 | `OpenFF NAGL2 Training Optimization Dataset v4.0` | [2024-12-09-OpenFF-NAGL2-Training-Optimization-Dataset-v4.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-12-09-OpenFF-NAGL2-Training-Optimization-Dataset-v4.0) | Optimization dataset for NAGL2 training, combined and filtered | Si, B, O, I, S, Cl, N, H, C, P, F, Br | |
+| `OpenFF Sage 2.0.0 Training Optimization v1.0` | [2024-12-12-OpenFF-Sage-2.0.0-Training-Optimization-Dataset-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-12-12-OpenFF-Sage-2.0.0-Training-Optimization-Dataset-v1.0) | B3LYP-D3BJ/DZVP conformers applicable to drug-like molecules for OpenFF 2.0.0 Sage | F, I, N, C, P, Cl, S, Br, O, H | |
 
 
 # TorsionDrive Datasets

diff --git a/...sions/2024-12-12-OpenFF-Sage-2.0.0-Training-Optimization-Dataset-v1.0/README.md b/...sions/2024-12-12-OpenFF-Sage-2.0.0-Training-Optimization-Dataset-v1.0/README.md
@@ -0,0 +1,50 @@
+# OpenFF Sage 2.0.0 Training Optimization v1.0
+
+### Description
+
+A quantum chemical (QC) dataset curated to train [OpenFF 2.0.0 Sage](https://github.com/openforcefield/openff-sage) forcefield, with reparametrized Lennard-Jones (LJ) and valence parameters, the latter relevent to this dataset. This QC dataset with the OpenFF default level of theory, B3LYP-D3BJ/DZVP, is used to benchmark Sage geometries and energetics. These optimized conformer geometries where used in conjunction with the QC dataset used to train one dimensional torsional profiles. This Generation 2 dataset increases chemical diversity when compared to Generation 1, which are of value to our industry partners. Large molecules (>20 heavy atoms) were also included, including more flexible molecules and a greater degree of conformational variation which provide intramolecular interactions.
+
+### General Information
+
+- Date: 2024 12 12
- Date: 2024 12 12
+- Date: 2024-12-12
- Date: 2024 12 12
+- Date: 2024-12-12
+- Class: OpenFF Optimization Dataset
+- Purpose: B3LYP-D3BJ/DZVP conformers applicable to drug-like molecules for OpenFF 2.0.0 Sage
+- Collection: OptimizationDataset
+- Name: OpenFF Sage 2.0.0 Training Optimization v1.0
+- Number of unique molecules       1025
+- Number of filtered molecules     0 
+- Number of conformers             3663
+- Number of conformers min mean max 1.00, 3.53, 10.00
+- Mean molecular weight: 261.38
+- Max molecular weight: 544.64
+- Set of charges: -2.0 -1.0 0.0 1.0
- Set of charges: -2.0 -1.0 0.0 1.0
+- Set of charges: -2.0, -1.0, 0.0, 1.0
- Set of charges: -2.0 -1.0 0.0 1.0
+- Set of charges: -2.0, -1.0, 0.0, 1.0
+- Dataset Submitter: Jennifer A. Clark
+- Dataset Curator: Simon Boothroyd
+- Dataset Generator: Hyesu Jang
+
+### QCSubmit generation pipeline
+
+- `generate-combined-dataset.ipynb`: A notebook which shows how the dataset was prepared from the input files.
- `generate-combined-dataset.ipynb`: A notebook which shows how the dataset was prepared from the input files.
+- `generate-combined-dataset.py`: A script which shows how the dataset was prepared from the input files.
- `generate-combined-dataset.ipynb`: A notebook which shows how the dataset was prepared from the input files.
+- `generate-combined-dataset.py`: A script which shows how the dataset was prepared from the input files.
+
+### QCSubmit Manifest
+
+- `generate-combined-dataset.ipynb`
+- `dataset.json.bz2`: The basic dataset ready for submission.
+- `dataset.pdf`: A pdf file containing molecule 2D structures.
+- `dataset.smi`: SMILES for every molecule in the submission.
+
+### Metadata
+
+* Elements: {F, I, N, C, P, Cl, S, Br, O, H}
+* QC Specifications: default
+  * basis: DZVP
+  * implicit_solvent: None
+  * keywords: {}
+  * maxiter: 200
+  * method: B3LYP-D3BJ
+  * program: psi4
+  * SCF Properties:
+    * dipole
+    * quadrupole
+    * wiberg_lowdin_indices
+    * mayer_indices
diff --git a/submissions/2024-12-12-OpenFF-Sage-2.0.0-Training-Optimization-Dataset-v1.0/conda_env.yml b/submissions/2024-12-12-OpenFF-Sage-2.0.0-Training-Optimization-Dataset-v1.0/conda_env.yml
@@ -0,0 +1,161 @@
+name: qcarchive-user-submit
+channels:
+- conda-forge
+- openeye
+dependencies:
+- annotated-types=0.7.0=pyhd8ed1ab_1
+- apsw=3.47.0.0=py311hde754ab_0
+- argcomplete=3.5.2=pyhd8ed1ab_0
+- attrs=24.2.0=pyh71513ae_1
+- basis_set_exchange=0.10=pyhd8ed1ab_1
+- brotli=1.1.0=hd74edd7_2
+- brotli-bin=1.1.0=hd74edd7_2
+- brotli-python=1.1.0=py311h3f08180_2
+- bson=0.5.9=py_0
+- bzip2=1.0.8=h99b78c6_7
+- ca-certificates=2024.8.30=hf0a4a13_0
+- cached-property=1.5.2=hd8ed1ab_1
+- cached_property=1.5.2=pyha770c72_1
+- cachetools=5.5.0=pyhd8ed1ab_1
+- cairo=1.18.2=h6a3b0d2_1
+- certifi=2024.8.30=pyhd8ed1ab_0
+- cffi=1.17.1=py311h3a79f62_0
+- chardet=5.2.0=py311h267d04e_2
+- charset-normalizer=3.4.0=pyhd8ed1ab_1
+- colorama=0.4.6=pyhd8ed1ab_1
+- contourpy=1.3.1=py311h210dab8_0
+- cycler=0.12.1=pyhd8ed1ab_1
+- dill=0.3.9=pyhd8ed1ab_1
+- exceptiongroup=1.2.2=pyhd8ed1ab_1
+- font-ttf-dejavu-sans-mono=2.37=hab24e00_0
+- font-ttf-inconsolata=3.000=h77eed37_0
+- font-ttf-source-code-pro=2.038=h77eed37_0
+- font-ttf-ubuntu=0.83=h77eed37_3
+- fontconfig=2.15.0=h1383a14_1
+- fonts-conda-ecosystem=1=0
+- fonts-conda-forge=1=0
+- fonttools=4.55.3=py311h4921393_0
+- freetype=2.12.1=hadb7bae_2
+- freetype-py=2.3.0=pyhd8ed1ab_0
+- greenlet=3.1.1=py311h3f08180_0
+- h2=4.1.0=pyhd8ed1ab_1
+- hpack=4.0.0=pyhd8ed1ab_1
+- hyperframe=6.0.1=pyhd8ed1ab_1
+- icu=75.1=hfee45f7_0
+- idna=3.10=pyhd8ed1ab_1
+- importlib-metadata=8.5.0=pyha770c72_1
+- importlib_resources=6.4.5=pyhd8ed1ab_1
+- iniconfig=2.0.0=pyhd8ed1ab_1
+- jsonschema=4.23.0=pyhd8ed1ab_1
+- jsonschema-specifications=2024.10.1=pyhd8ed1ab_1
+- kiwisolver=1.4.7=py311h2c37856_0
+- krb5=1.21.3=h237132a_0
+- lcms2=2.16=ha0e7c42_0
+- lerc=4.0.0=h9a09cb3_0
+- libblas=3.9.0=25_osxarm64_openblas
+- libboost=1.84.0=hc9fb7c5_7
+- libboost-python=1.84.0=py311h8fc16d6_7
+- libbrotlicommon=1.1.0=hd74edd7_2
+- libbrotlidec=1.1.0=hd74edd7_2
+- libbrotlienc=1.1.0=hd74edd7_2
+- libcblas=3.9.0=25_osxarm64_openblas
+- libcxx=19.1.5=ha82da77_0
+- libdeflate=1.22=hd74edd7_0
+- libedit=3.1.20191231=hc8eb9b7_2
+- libexpat=2.6.4=h286801f_0
+- libffi=3.4.2=h3422bc3_5
+- libgfortran=5.0.0=13_2_0_hd922786_3
+- libgfortran5=13.2.0=hf226fd6_3
+- libglib=2.82.2=h07bd6cf_0
+- libiconv=1.17=h0d3ecfb_2
+- libintl=0.22.5=h8414b35_3
+- libjpeg-turbo=3.0.0=hb547adb_1
+- liblapack=3.9.0=25_osxarm64_openblas
+- liblzma=5.6.3=h39f12f2_1
+- libopenblas=0.3.28=openmp_hf332438_1
+- libpng=1.6.44=hc14010f_0
+- libpq=16.6=hb008251_1
+- librdkit=2024.03.5=h54a62e4_3
+- libsqlite=3.47.0=hbaaea75_1
+- libtiff=4.7.0=ha962b0a_2
+- libwebp-base=1.4.0=h93a5062_0
+- libxcb=1.17.0=hdb1d25a_0
+- libzlib=1.3.1=h8359307_2
+- llvm-openmp=19.1.5=hdb05f8b_0
+- matplotlib-base=3.9.3=py311h031da69_0
+- msgpack-python=1.1.0=py311h2c37856_0
+- multiprocess=0.70.17=py311h917b07b_1
+- munkres=1.1.4=pyh9f0ad1d_0
+- ncurses=6.5=h7bae524_1
+- networkx=3.4.2=pyh267e887_2
+- numpy=1.26.4=py311h7125741_0
+- openeye-toolkits=2024.2.0=py311_0
+- openff-amber-ff-ports=0.0.4=pyhca7485f_0
+- openff-forcefields=2024.09.0=pyhff2d567_0
+- openff-qcsubmit=0.54.0=pyhd8ed1ab_0
+- openff-toolkit-base=0.16.7=pyhd8ed1ab_0
+- openff-units=0.2.2=pyhca7485f_0
+- openff-utilities=0.1.13=pyhd8ed1ab_0
+- openjpeg=2.5.3=h8a3d83b_0
+- openssl=3.4.0=h39f12f2_0
+- packaging=24.2=pyhd8ed1ab_2
+- pandas=2.2.2=py311h4b4568b_1
+- pcre2=10.44=h297a79d_2
+- pillow=11.0.0=py311h3894ae9_0
+- pint=0.23=pyhd8ed1ab_1
+- pip=24.3.1=pyh8b19718_0
+- pixman=0.44.2=h2f9eb0b_0
+- pkgutil-resolve-name=1.3.10=pyhd8ed1ab_2
+- pluggy=1.5.0=pyhd8ed1ab_1
+- pthread-stubs=0.4=hd74edd7_1002
+- pycairo=1.27.0=py311h84a5a08_0
+- pycalverter=1.6.1=pyhd8ed1ab_1
+- pycparser=2.22=pyh29332c3_1
+- pydantic=2.10.3=pyh3cfb1c2_0
+- pydantic-core=2.27.1=py311h3ff9189_0
+- pyjwt=2.10.1=pyhd8ed1ab_0
+- pyparsing=3.2.0=pyhd8ed1ab_2
+- pysocks=1.7.1=pyha55dd90_7
+- pytest=8.3.4=pyhd8ed1ab_1
+- python=3.11.11=hc22306f_1_cpython
+- python-constraint=1.4.0=py_0
+- python-dateutil=2.9.0.post0=pyhff2d567_1
+- python-tzdata=2024.2=pyhd8ed1ab_1
+- python_abi=3.11=5_cp311
+- pytz=2024.2=pyhd8ed1ab_1
+- pyyaml=6.0.2=py311h460d6c5_1
+- qcelemental=0.28.0=pyhd8ed1ab_1
+- qcportal=0.56=pyhd8ed1ab_1
+- qhull=2020.2=h420ef59_5
+- rdkit=2024.03.5=py311h8a4e316_3
+- readline=8.2=h92ec313_1
+- referencing=0.35.1=pyhd8ed1ab_1
+- regex=2024.11.6=py311h917b07b_0
+- reportlab=4.2.5=py311h460d6c5_0
+- requests=2.32.3=pyhd8ed1ab_1
+- rlpycairo=0.2.0=pyhd8ed1ab_0
+- rpds-py=0.22.3=py311h3ff9189_0
+- setuptools=75.6.0=pyhff2d567_1
+- six=1.17.0=pyhd8ed1ab_0
+- smirnoff99frosst=1.1.0=pyh44b312d_0
+- sqlalchemy=2.0.36=py311hae2e1ce_0
+- sqlite=3.47.0=hcd14bea_1
+- tabulate=0.9.0=pyhd8ed1ab_2
+- tk=8.6.13=h5083fa2_1
+- tomli=2.2.1=pyhd8ed1ab_1
+- tqdm=4.67.1=pyhd8ed1ab_0
+- typing-extensions=4.12.2=hd8ed1ab_1
+- typing_extensions=4.12.2=pyha770c72_1
+- tzdata=2024b=hc8b5060_0
+- unicodedata2=15.1.0=py311hae2e1ce_1
+- unidecode=1.3.8=pyh29332c3_1
+- urllib3=2.2.3=pyhd8ed1ab_1
+- wheel=0.45.1=pyhd8ed1ab_1
+- xmltodict=0.14.2=pyhd8ed1ab_1
+- xorg-libxau=1.0.11=hd74edd7_1
+- xorg-libxdmcp=1.1.5=hd74edd7_0
+- yaml=0.2.5=h3422bc3_2
+- zipp=3.21.0=pyhd8ed1ab_1
+- zstandard=0.23.0=py311ha60cc69_1
+- zstd=1.5.6=hb46c0d2_0
+
diff --git a/submissions/2024-12-12-OpenFF-Sage-2.0.0-Training-Optimization-Dataset-v1.0/dataset.json.bz2 b/submissions/2024-12-12-OpenFF-Sage-2.0.0-Training-Optimization-Dataset-v1.0/dataset.json.bz2
diff --git a/submissions/2024-12-12-OpenFF-Sage-2.0.0-Training-Optimization-Dataset-v1.0/dataset.pdf b/submissions/2024-12-12-OpenFF-Sage-2.0.0-Training-Optimization-Dataset-v1.0/dataset.pdf