Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve SMILES translation for surface adsorbates #2701

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

rwest
Copy link
Member

@rwest rwest commented Aug 2, 2024

Motivation or Problem

I recently discovered (a happy accident) that you can put * in a SMILES string and RDKit will use it as a dummy atom, with an atomic number of 0, which RMG happens to interpret as a surface site, so you can easily enter adsorbates as SMILES strings this way. (Was I the last to realize this? seems super convenient!).

Anyway, the round trip didn't work, because it would output the atom as [Pt] when you convert things into a SMILES string.

Description of Changes

This PR changes it so you can do round-trip read and write SMILES with * for surface sites.

The default atom for a surface site being turned into an RDKit molecule is now the wildcard 0. When you're dealing with InChI conversion it instead uses atom 78 (platinum), because otherwise the inchi stuff crashes.
Molecules with an N atom in are by default converted to SMILES by OpenBabel, which used the [Pt] syntax, which made things look inconsistent and prevented a round-trip conversion for those adsorbates only. So then I made the OpenBabel converter replace [Pt] with * after it's made a SMILES.

Now you can do round-trip conversion to and from SMILES for at least these adsorbates:
*C, *CC, *=C, *[H], *=O, *COC*, *CNC, *C1CCCC1, *N, *N(Cl)

Testing

I wrote unit tests, (that was the lion's share of the work, as usual).
Locally, I'm getting some weird augmented inchi error when I debug in VSCode ('CH2O2/c2-1-3/h1-2H/u1,3' == 'CH2O2/c2-1-3/h1H,(H,2,3)/u1,2') but I can't see why that would have changed. When I run in my console with make test I instead get a TestSoluteDatabase::test_saturation_density error (1.93 == 0).
I'm hoping neither of these are actual problems and the GitHub CI runs smoothly. 🤞

Reviewer Tips

The output, log files, etc. will look different.
Species names are now more likely to have * in them (than [Pt]).
Hopefully filenames with * in aren't a problem.
Maybe there are other unintended consequences we'll have to deal with?
Will this mess up anyone's workflows (in ways that they shouldn't just improve their workflows?).
Try it out and report back.

You can use a * to represent a surface site in a SMILES string when
reading it via RDKit, and RDKit turns this into a dummy atom
with atomic number 0.

By doing the reverse (telling RDKit that our surface sites have
atomic number 0) we can use RDKit to *generate* SMILES strings
in the same format, enabling a round trip.

But for other things like InChIs it seems more robust to use
an atom like Platinum (78). This allows both, with default 0,
but 78 used in InChI conversion.
Using the new syntax with * for a surface site.
Unfortunately going from a molecule TO a smiles uses
OpenBabel if you have Nitrogen in the molecule, which 
then uses [Pt] in place of *.
But you can still READ smiles with * and N in. 
That means you don't get a round trip. 
In [9]: Molecule(smiles='CNC*').to_smiles()
Out[9]: 'CNC[Pt]'

Still, this is better than it was. (I think).
This means they can be parsed in a round trip by RDKit
(the default SMILES reader). 
This is handy because OpenBabel is the default SMILES
*writer* for things with an N atom, but not everything.
Now it's more consistent, outputting a * for a surface site.

I added a unit test for round-trip conversion to and from 
SMILES a few times for various adsorbates including some with N.
@rwest
Copy link
Member Author

rwest commented Aug 3, 2024

@mjohnson541 we were just discussing how RMS uses SMILES for some things and uses either RDKit or RMG. Will this have impacts on RMS that need coordination?

@mjohnson541
Copy link
Contributor

I don't think so, right now RMS can't do anything with surface smiles, this should allow RMS to accept smiles for surface species.

Copy link

github-actions bot commented Aug 3, 2024

Regression Testing Results

WARNING:root:Initial mole fractions do not sum to one; normalizing.
WARNING:root:Initial mole fractions do not sum to one; normalizing.
WARNING:root:Initial mole fractions do not sum to one; normalizing.
⚠️ One or more regression tests failed.
Please download the failed results and run the tests locally or check the log to see why.

Detailed regression test results.

Regression test aromatics:

Reference: Execution time (DD:HH:MM:SS): 00:00:01:06
Current: Execution time (DD:HH:MM:SS): 00:00:01:07
Reference: Memory used: 2776.74 MB
Current: Memory used: 2770.21 MB

aromatics Passed Core Comparison ✅

Original model has 15 species.
Test model has 15 species. ✅
Original model has 11 reactions.
Test model has 11 reactions. ✅

aromatics Failed Edge Comparison ❌

Original model has 106 species.
Test model has 106 species. ✅
Original model has 358 reactions.
Test model has 358 reactions. ✅

Non-identical thermo! ❌
original: C=CC1C=CC2=CC1C=C2
tested: C=CC1C=CC2=CC1C=C2

Hf(300K) S(300K) Cp(300K) Cp(400K) Cp(500K) Cp(600K) Cp(800K) Cp(1000K) Cp(1500K)
83.22 84.16 35.48 45.14 53.78 61.40 73.58 82.20 95.08
83.22 82.78 35.48 45.14 53.78 61.40 73.58 82.20 95.08

Identical thermo comments:
thermo: Thermo group additivity estimation: group(Cs-(Cds-Cds)(Cds-Cds)CsH) + group(Cs-(Cds-Cds)(Cds-Cds)CsH) + group(Cds-Cds(Cds-Cds)(Cds-Cds)) + group(Cds- CdsCsH) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-Cds(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)H) + group(Cds-CdsHH) + Estimated bicyclic component: polycyclic(s3_5_6_ane) - ring(Cyclohexane) - ring(Cyclopentane) + ring(1,3-Cyclohexadiene) + ring(Cyclopentadiene)

Non-identical thermo! ❌
original: C1=CC2C=CC=1C=C2
tested: C1=CC2C=CC=1C=C2

Hf(300K) S(300K) Cp(300K) Cp(400K) Cp(500K) Cp(600K) Cp(800K) Cp(1000K) Cp(1500K)
129.39 79.85 22.98 30.09 36.61 42.21 50.22 55.39 65.95
164.90 80.93 22.21 28.97 35.25 40.69 48.70 53.97 64.36

thermo: Thermo group additivity estimation: group(Cs-(Cds-Cds)(Cds-Cds)(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)(Cds-Cds)) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-Cds(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)H) + group(Cds-CdsCsH) + group(Cdd-CdsCds) + Estimated bicyclic component: polycyclic(s4_6_6_ane) - ring(Cyclohexane) - ring(Cyclohexane) + ring(124cyclohexatriene) + ring(1,4-Cyclohexadiene)
thermo: Thermo group additivity estimation: group(Cs-(Cds-Cds)(Cds-Cds)(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)(Cds-Cds)) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-Cds(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)H) + group(Cds-CdsCsH) + group(Cdd-CdsCds) + Estimated bicyclic component: polycyclic(s4_6_6_ane) - ring(Cyclohexane) - ring(Cyclohexane) + ring(124cyclohexatriene) + ring(124cyclohexatriene)

Non-identical kinetics! ❌
original:
rxn: [c]1ccccc1(3) + C1=CC2C=C[C]1C=C2(49) <=> benzene(1) + C1=CC2C=CC=1C=C2(79) origin: Disproportionation
tested:
rxn: [c]1ccccc1(3) + C1=CC2C=C[C]1C=C2(49) <=> benzene(1) + C1=CC2C=CC=1C=C2(79) origin: Disproportionation

k(1bar) 300K 400K 500K 600K 800K 1000K 1500K 2000K
k(T): 4.24 4.69 5.05 5.33 5.79 6.14 6.78 7.23
k(T): -3.00 -0.74 0.70 1.71 3.07 3.97 5.33 6.15

kinetics: Arrhenius(A=(17.1699,'cm^3/(mol*s)'), n=3.635, Ea=(0,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R Multiplied by reaction path degeneracy 3.0""")
kinetics: Arrhenius(A=(17.1699,'cm^3/(mol*s)'), n=3.635, Ea=(9.943,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R Multiplied by reaction path degeneracy 3.0 Ea raised from 38.5 to 41.6 kJ/mol to match endothermicity of reaction.""")
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R
Multiplied by reaction path degeneracy 3.0
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R
Multiplied by reaction path degeneracy 3.0
Ea raised from 38.5 to 41.6 kJ/mol to match endothermicity of reaction.

Non-identical kinetics! ❌
original:
rxn: [H](4) + C1=CC2C=C[C]1C=C2(49) <=> [H][H](11) + C1=CC2C=CC=1C=C2(79) origin: Disproportionation
tested:
rxn: [H](4) + C1=CC2C=C[C]1C=C2(49) <=> [H][H](11) + C1=CC2C=CC=1C=C2(79) origin: Disproportionation

k(1bar) 300K 400K 500K 600K 800K 1000K 1500K 2000K
k(T): 5.77 5.83 5.88 5.92 5.97 6.02 6.10 6.16
k(T): -7.44 -4.08 -2.05 -0.69 1.02 2.06 3.46 4.18

kinetics: Arrhenius(A=(4.06926e+10,'cm^3/(mol*s)'), n=0.47, Ea=(0,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O Multiplied by reaction path degeneracy 3.0""")
kinetics: Arrhenius(A=(4.06926e+10,'cm^3/(mol*s)'), n=0.47, Ea=(18.137,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O Multiplied by reaction path degeneracy 3.0 Ea raised from 75.2 to 75.9 kJ/mol to match endothermicity of reaction.""")
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O
Multiplied by reaction path degeneracy 3.0
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O
Multiplied by reaction path degeneracy 3.0
Ea raised from 75.2 to 75.9 kJ/mol to match endothermicity of reaction.

Non-identical kinetics! ❌
original:
rxn: [CH]=C(7) + C1=CC2C=C[C]1C=C2(49) <=> C=C(13) + C1=CC2C=CC=1C=C2(79) origin: Disproportionation
tested:
rxn: [CH]=C(7) + C1=CC2C=C[C]1C=C2(49) <=> C=C(13) + C1=CC2C=CC=1C=C2(79) origin: Disproportionation

k(1bar) 300K 400K 500K 600K 800K 1000K 1500K 2000K
k(T): 4.06 4.76 5.18 5.46 5.81 6.02 6.30 6.44
k(T): -7.17 -3.66 -1.56 -0.16 1.60 2.65 4.05 4.75

kinetics: Arrhenius(A=(7.23e+12,'cm^3/(mol*s)'), n=0, Ea=(3.841,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_N-Sp-6R!H-4CHNS Multiplied by reaction path degeneracy 3.0""")
kinetics: Arrhenius(A=(7.23e+12,'cm^3/(mol*s)'), n=0, Ea=(19.262,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_N-Sp-6R!H-4CHNS Multiplied by reaction path degeneracy 3.0""")
Identical kinetics comments:
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_N-Sp-6R!H-4CHNS
Multiplied by reaction path degeneracy 3.0

Non-identical kinetics! ❌
original:
rxn: [CH]1C2=CC=CC12(8) + C1=CC2C=C[C]1C=C2(49) <=> C1=CC2CC2=C1(27) + C1=CC2C=CC=1C=C2(79) origin: Disproportionation
tested:
rxn: [CH]1C2=CC=CC12(8) + C1=CC2C=C[C]1C=C2(49) <=> C1=CC2CC2=C1(27) + C1=CC2C=CC=1C=C2(79) origin: Disproportionation

k(1bar) 300K 400K 500K 600K 800K 1000K 1500K 2000K
k(T): -4.55 -1.90 -0.23 0.94 2.49 3.50 5.02 5.92
k(T): -30.48 -21.35 -15.79 -12.03 -7.23 -4.28 -0.16 2.03

kinetics: Arrhenius(A=(17.1699,'cm^3/(mol*s)'), n=3.635, Ea=(12.063,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R Multiplied by reaction path degeneracy 3.0 Ea raised from 46.8 to 50.5 kJ/mol to match endothermicity of reaction.""")
kinetics: Arrhenius(A=(17.1699,'cm^3/(mol*s)'), n=3.635, Ea=(47.659,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R Multiplied by reaction path degeneracy 3.0 Ea raised from 195.4 to 199.4 kJ/mol to match endothermicity of reaction.""")
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R
Multiplied by reaction path degeneracy 3.0
Ea raised from 46.8 to 50.5 kJ/mol to match endothermicity of reaction.
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R
Multiplied by reaction path degeneracy 3.0
Ea raised from 195.4 to 199.4 kJ/mol to match endothermicity of reaction.

Non-identical kinetics! ❌
original:
rxn: [CH]1C2=CC=CC12(8) + C1=CC2C=C[C]1C=C2(49) <=> C1=CC2C=C2C1(29) + C1=CC2C=CC=1C=C2(79) origin: Disproportionation
tested:
rxn: [CH]1C2=CC=CC12(8) + C1=CC2C=C[C]1C=C2(49) <=> C1=CC2C=C2C1(29) + C1=CC2C=CC=1C=C2(79) origin: Disproportionation

k(1bar) 300K 400K 500K 600K 800K 1000K 1500K 2000K
k(T): -5.30 -2.46 -0.68 0.57 2.21 3.28 4.87 5.80
k(T): -31.23 -21.91 -16.23 -12.40 -7.51 -4.50 -0.31 1.91

kinetics: Arrhenius(A=(17.1699,'cm^3/(mol*s)'), n=3.635, Ea=(13.089,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R Multiplied by reaction path degeneracy 3.0 Ea raised from 53.5 to 54.8 kJ/mol to match endothermicity of reaction.""")
kinetics: Arrhenius(A=(17.1699,'cm^3/(mol*s)'), n=3.635, Ea=(48.686,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R Multiplied by reaction path degeneracy 3.0 Ea raised from 202.2 to 203.7 kJ/mol to match endothermicity of reaction.""")
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R
Multiplied by reaction path degeneracy 3.0
Ea raised from 53.5 to 54.8 kJ/mol to match endothermicity of reaction.
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R
Multiplied by reaction path degeneracy 3.0
Ea raised from 202.2 to 203.7 kJ/mol to match endothermicity of reaction.

Non-identical kinetics! ❌
original:
rxn: [CH]1C2=CC=CC12(8) + C1=CC2C=C[C]1C=C2(49) <=> C1=CC2=CC2C1(28) + C1=CC2C=CC=1C=C2(79) origin: Disproportionation
tested:
rxn: [CH]1C2=CC=CC12(8) + C1=CC2C=C[C]1C=C2(49) <=> C1=CC2=CC2C1(28) + C1=CC2C=CC=1C=C2(79) origin: Disproportionation

k(1bar) 300K 400K 500K 600K 800K 1000K 1500K 2000K
k(T): -1.38 0.48 1.67 2.52 3.68 4.45 5.66 6.39
k(T): -27.24 -18.91 -13.84 -10.40 -6.02 -3.30 0.48 2.51

kinetics: Arrhenius(A=(17.1699,'cm^3/(mol*s)'), n=3.635, Ea=(7.718,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R Multiplied by reaction path degeneracy 3.0""")
kinetics: Arrhenius(A=(17.1699,'cm^3/(mol*s)'), n=3.635, Ea=(43.208,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R Multiplied by reaction path degeneracy 3.0 Ea raised from 180.2 to 180.8 kJ/mol to match endothermicity of reaction.""")
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R
Multiplied by reaction path degeneracy 3.0
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R
Multiplied by reaction path degeneracy 3.0
Ea raised from 180.2 to 180.8 kJ/mol to match endothermicity of reaction.

Non-identical kinetics! ❌
original:
rxn: [CH]=CC=C(15) + C1=CC2C=C[C]1C=C2(49) <=> C=CC=C(17) + C1=CC2C=CC=1C=C2(79) origin: Disproportionation
tested:
rxn: [CH]=CC=C(15) + C1=CC2C=C[C]1C=C2(49) <=> C=CC=C(17) + C1=CC2C=CC=1C=C2(79) origin: Disproportionation

k(1bar) 300K 400K 500K 600K 800K 1000K 1500K 2000K
k(T): -0.49 0.99 1.87 2.46 3.19 3.64 4.23 4.52
k(T): -11.95 -7.61 -5.01 -3.27 -1.10 0.20 1.93 2.80

kinetics: Arrhenius(A=(2.529e+11,'cm^3/(mol*s)'), n=0, Ea=(8.084,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-6R!H-R Multiplied by reaction path degeneracy 3.0""")
kinetics: Arrhenius(A=(2.529e+11,'cm^3/(mol*s)'), n=0, Ea=(23.821,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-6R!H-R Multiplied by reaction path degeneracy 3.0""")
Identical kinetics comments:
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-6R!H-R
Multiplied by reaction path degeneracy 3.0

Non-identical kinetics! ❌
original:
rxn: C1=CC2C=C[C]1C=C2(49) + [CH]=Cc1ccccc1(12) <=> C1=CC2C=CC=1C=C2(79) + C=Cc1ccccc1(16) origin: Disproportionation
tested:
rxn: C1=CC2C=C[C]1C=C2(49) + [CH]=Cc1ccccc1(12) <=> C1=CC2C=CC=1C=C2(79) + C=Cc1ccccc1(16) origin: Disproportionation

k(1bar) 300K 400K 500K 600K 800K 1000K 1500K 2000K
k(T): -0.66 0.85 1.76 2.37 3.13 3.58 4.19 4.49
k(T): -12.28 -7.86 -5.21 -3.44 -1.23 0.10 1.87 2.75

kinetics: Arrhenius(A=(2.529e+11,'cm^3/(mol*s)'), n=0, Ea=(8.328,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-6R!H-R Multiplied by reaction path degeneracy 3.0""")
kinetics: Arrhenius(A=(2.529e+11,'cm^3/(mol*s)'), n=0, Ea=(24.273,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-6R!H-R Multiplied by reaction path degeneracy 3.0""")
Identical kinetics comments:
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-6R!H-R
Multiplied by reaction path degeneracy 3.0

Non-identical kinetics! ❌
original:
rxn: C1=CC2C=C[C]1C=C2(49) + [CH]1C2=CC=CC1C=C2(48) <=> C1=CC2C=CC=1C=C2(79) + C1=CC2C=CC(=C1)C2(69) origin: Disproportionation
tested:
rxn: C1=CC2C=C[C]1C=C2(49) + [CH]1C2=CC=CC1C=C2(48) <=> C1=CC2C=CC=1C=C2(79) + C1=CC2C=CC(=C1)C2(69) origin: Disproportionation

k(1bar) 300K 400K 500K 600K 800K 1000K 1500K 2000K
k(T): -4.51 -1.87 -0.20 0.96 2.51 3.52 5.03 5.92
k(T): -30.44 -21.32 -15.76 -12.01 -7.22 -4.26 -0.16 2.03

kinetics: Arrhenius(A=(17.1699,'cm^3/(mol*s)'), n=3.635, Ea=(12.01,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R Multiplied by reaction path degeneracy 3.0 Ea raised from 46.5 to 50.2 kJ/mol to match endothermicity of reaction.""")
kinetics: Arrhenius(A=(17.1699,'cm^3/(mol*s)'), n=3.635, Ea=(47.606,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R Multiplied by reaction path degeneracy 3.0 Ea raised from 195.1 to 199.2 kJ/mol to match endothermicity of reaction.""")
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R
Multiplied by reaction path degeneracy 3.0
Ea raised from 46.5 to 50.2 kJ/mol to match endothermicity of reaction.
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R
Multiplied by reaction path degeneracy 3.0
Ea raised from 195.1 to 199.2 kJ/mol to match endothermicity of reaction.

Non-identical kinetics! ❌
original:
rxn: C1=CC2C=C[C]1C=C2(49) + [CH]1C2=CC=CC1C=C2(48) <=> C1=CC2C=CC=1C=C2(79) + C1=CC2C=CC(=C2)C1(70) origin: Disproportionation
tested:
rxn: C1=CC2C=C[C]1C=C2(49) + [CH]1C2=CC=CC1C=C2(48) <=> C1=CC2C=CC=1C=C2(79) + C1=CC2C=CC(=C2)C1(70) origin: Disproportionation

k(1bar) 300K 400K 500K 600K 800K 1000K 1500K 2000K
k(T): -6.18 -3.12 -1.20 0.13 1.88 3.01 4.70 5.67
k(T): -32.11 -22.57 -16.76 -12.84 -7.84 -4.76 -0.49 1.78

kinetics: Arrhenius(A=(17.1699,'cm^3/(mol*s)'), n=3.635, Ea=(14.299,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R Multiplied by reaction path degeneracy 3.0 Ea raised from 56.6 to 59.8 kJ/mol to match endothermicity of reaction.""")
kinetics: Arrhenius(A=(17.1699,'cm^3/(mol*s)'), n=3.635, Ea=(49.895,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R Multiplied by reaction path degeneracy 3.0 Ea raised from 205.2 to 208.8 kJ/mol to match endothermicity of reaction.""")
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R
Multiplied by reaction path degeneracy 3.0
Ea raised from 56.6 to 59.8 kJ/mol to match endothermicity of reaction.
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R
Multiplied by reaction path degeneracy 3.0
Ea raised from 205.2 to 208.8 kJ/mol to match endothermicity of reaction.

Non-identical kinetics! ❌
original:
rxn: C1=CC2C=C[C]1C=C2(49) + [CH]1C2=CC=CC1C=C2(48) <=> C1=CC2C=CC=1C=C2(79) + C1=CC2=CC(C=C2)C1(71) origin: Disproportionation
tested:
rxn: C1=CC2C=C[C]1C=C2(49) + [CH]1C2=CC=CC1C=C2(48) <=> C1=CC2C=CC=1C=C2(79) + C1=CC2=CC(C=C2)C1(71) origin: Disproportionation

k(1bar) 300K 400K 500K 600K 800K 1000K 1500K 2000K
k(T): -8.04 -4.52 -2.32 -0.81 1.18 2.46 4.32 5.39
k(T): -33.97 -23.97 -17.88 -13.77 -8.54 -5.32 -0.86 1.50

kinetics: Arrhenius(A=(17.1699,'cm^3/(mol*s)'), n=3.635, Ea=(16.86,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R Multiplied by reaction path degeneracy 3.0 Ea raised from 65.8 to 70.5 kJ/mol to match endothermicity of reaction.""")
kinetics: Arrhenius(A=(17.1699,'cm^3/(mol*s)'), n=3.635, Ea=(52.457,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R Multiplied by reaction path degeneracy 3.0 Ea raised from 214.4 to 219.5 kJ/mol to match endothermicity of reaction.""")
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R
Multiplied by reaction path degeneracy 3.0
Ea raised from 65.8 to 70.5 kJ/mol to match endothermicity of reaction.
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R
Multiplied by reaction path degeneracy 3.0
Ea raised from 214.4 to 219.5 kJ/mol to match endothermicity of reaction.

Non-identical kinetics! ❌
original:
rxn: C1=CC2C=C[C]1C=C2(49) + C1=CC2C=C[C]1C=C2(49) <=> C1=CC2C=CC=1C=C2(79) + C1=CC2C=CC1C=C2(82) origin: Disproportionation
tested:
rxn: C1=CC2C=C[C]1C=C2(49) + C1=CC2C=C[C]1C=C2(49) <=> C1=CC2C=CC=1C=C2(79) + C1=CC2C=CC1C=C2(82) origin: Disproportionation

k(1bar) 300K 400K 500K 600K 800K 1000K 1500K 2000K
k(T): -4.55 -1.90 -0.23 0.94 2.49 3.50 5.02 5.92
k(T): -30.48 -21.35 -15.79 -12.03 -7.23 -4.28 -0.16 2.03

kinetics: Arrhenius(A=(17.1699,'cm^3/(mol*s)'), n=3.635, Ea=(12.063,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R Multiplied by reaction path degeneracy 3.0 Ea raised from 46.8 to 50.5 kJ/mol to match endothermicity of reaction.""")
kinetics: Arrhenius(A=(17.1699,'cm^3/(mol*s)'), n=3.635, Ea=(47.659,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R Multiplied by reaction path degeneracy 3.0 Ea raised from 195.4 to 199.4 kJ/mol to match endothermicity of reaction.""")
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R
Multiplied by reaction path degeneracy 3.0
Ea raised from 46.8 to 50.5 kJ/mol to match endothermicity of reaction.
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R
Multiplied by reaction path degeneracy 3.0
Ea raised from 195.4 to 199.4 kJ/mol to match endothermicity of reaction.

Non-identical kinetics! ❌
original:
rxn: C1=CC2C=C[C]1C=C2(49) + C1=CC2C=C[C]1C=C2(49) <=> C1=CC2C=CC=1C=C2(79) + C1=CC2C=CC1=CC2(83) origin: Disproportionation
tested:
rxn: C1=CC2C=C[C]1C=C2(49) + C1=CC2C=C[C]1C=C2(49) <=> C1=CC2C=CC=1C=C2(79) + C1=CC2C=CC1=CC2(83) origin: Disproportionation

k(1bar) 300K 400K 500K 600K 800K 1000K 1500K 2000K
k(T): 3.96 4.60 5.07 5.43 5.98 6.39 7.11 7.60
k(T): -19.49 -12.98 -9.00 -6.29 -2.81 -0.64 2.42 4.08

kinetics: Arrhenius(A=(51.5097,'cm^3/(mol*s)'), n=3.635, Ea=(1.036,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R Multiplied by reaction path degeneracy 9.0""")
kinetics: Arrhenius(A=(51.5097,'cm^3/(mol*s)'), n=3.635, Ea=(33.226,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R Multiplied by reaction path degeneracy 9.0 Ea raised from 133.4 to 139.0 kJ/mol to match endothermicity of reaction.""")
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R
Multiplied by reaction path degeneracy 9.0
kinetics: Estimated from node Root_Ext-1R!H-R_N-4R->O_Sp-5R!H=1R!H_Ext-4CHNS-R_Ext-4CHNS-R
Multiplied by reaction path degeneracy 9.0
Ea raised from 133.4 to 139.0 kJ/mol to match endothermicity of reaction.

Observables Test Case: Aromatics Comparison

✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions!

aromatics Passed Observable Testing ✅

Regression test liquid_oxidation:

Reference: Execution time (DD:HH:MM:SS): 00:00:02:10
Current: Execution time (DD:HH:MM:SS): 00:00:02:07
Reference: Memory used: 2923.08 MB
Current: Memory used: 2917.94 MB

liquid_oxidation Failed Core Comparison ❌

Original model has 37 species.
Test model has 37 species. ✅
Original model has 215 reactions.
Test model has 216 reactions. ❌
The tested model has 1 reactions that the original model does not have. ❌
rxn: CCO[O](29) <=> [OH](22) + CC=O(69) origin: intra_H_migration

liquid_oxidation Failed Edge Comparison ❌

Original model has 202 species.
Test model has 202 species. ✅
Original model has 1610 reactions.
Test model has 1610 reactions. ✅
The original model has 1 reactions that the tested model does not have. ❌
rxn: CCO[O](30) <=> C[CH]OO(70) origin: intra_H_migration
The tested model has 1 reactions that the original model does not have. ❌
rxn: CCO[O](29) <=> [OH](22) + CC=O(69) origin: intra_H_migration

Observables Test Case: liquid_oxidation Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

liquid_oxidation Passed Observable Testing ✅

Regression test nitrogen:

Reference: Execution time (DD:HH:MM:SS): 00:00:01:28
Current: Execution time (DD:HH:MM:SS): 00:00:01:27
Reference: Memory used: 2909.51 MB
Current: Memory used: 2910.42 MB

nitrogen Failed Core Comparison ❌

Original model has 41 species.
Test model has 41 species. ✅
Original model has 360 reactions.
Test model has 359 reactions. ❌
The original model has 1 reactions that the tested model does not have. ❌
rxn: HNO(48) + HCO(13) <=> NO(38) + CH2O(18) origin: H_Abstraction

nitrogen Failed Edge Comparison ❌

Original model has 133 species.
Test model has 133 species. ✅
Original model has 983 reactions.
Test model has 981 reactions. ❌
The original model has 2 reactions that the tested model does not have. ❌
rxn: HNO(48) + HCO(13) <=> NO(38) + CH2O(18) origin: H_Abstraction
rxn: HON(T)(83) + HCO(13) <=> NO(38) + CH2O(18) origin: Disproportionation

Observables Test Case: NC Comparison

✅ All Observables varied by less than 0.200 on average between old model and new model in all conditions!

nitrogen Passed Observable Testing ✅

Regression test oxidation:

Reference: Execution time (DD:HH:MM:SS): 00:00:02:27
Current: Execution time (DD:HH:MM:SS): 00:00:02:28
Reference: Memory used: 2773.98 MB
Current: Memory used: 2771.28 MB

oxidation Passed Core Comparison ✅

Original model has 59 species.
Test model has 59 species. ✅
Original model has 694 reactions.
Test model has 694 reactions. ✅

oxidation Passed Edge Comparison ✅

Original model has 230 species.
Test model has 230 species. ✅
Original model has 1526 reactions.
Test model has 1526 reactions. ✅

Observables Test Case: Oxidation Comparison

✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions!

oxidation Passed Observable Testing ✅

Regression test sulfur:

Reference: Execution time (DD:HH:MM:SS): 00:00:00:56
Current: Execution time (DD:HH:MM:SS): 00:00:00:54
Reference: Memory used: 2880.59 MB
Current: Memory used: 2876.00 MB

sulfur Passed Core Comparison ✅

Original model has 27 species.
Test model has 27 species. ✅
Original model has 74 reactions.
Test model has 74 reactions. ✅

sulfur Failed Edge Comparison ❌

Original model has 89 species.
Test model has 89 species. ✅
Original model has 227 reactions.
Test model has 227 reactions. ✅
The original model has 1 reactions that the tested model does not have. ❌
rxn: O(4) + SO2(15) (+N2) <=> SO3(16) (+N2) origin: primarySulfurLibrary
The tested model has 1 reactions that the original model does not have. ❌
rxn: O(4) + SO2(15) (+N2) <=> SO3(16) (+N2) origin: primarySulfurLibrary

Observables Test Case: SO2 Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

sulfur Passed Observable Testing ✅

Regression test superminimal:

Reference: Execution time (DD:HH:MM:SS): 00:00:00:39
Current: Execution time (DD:HH:MM:SS): 00:00:00:39
Reference: Memory used: 3007.02 MB
Current: Memory used: 2972.35 MB

superminimal Passed Core Comparison ✅

Original model has 13 species.
Test model has 13 species. ✅
Original model has 21 reactions.
Test model has 21 reactions. ✅

superminimal Passed Edge Comparison ✅

Original model has 18 species.
Test model has 18 species. ✅
Original model has 28 reactions.
Test model has 28 reactions. ✅

Regression test RMS_constantVIdealGasReactor_superminimal:

Reference: Execution time (DD:HH:MM:SS): 00:00:02:22
Current: Execution time (DD:HH:MM:SS): 00:00:02:20
Reference: Memory used: 3464.70 MB
Current: Memory used: 3450.78 MB

RMS_constantVIdealGasReactor_superminimal Passed Core Comparison ✅

Original model has 13 species.
Test model has 13 species. ✅
Original model has 19 reactions.
Test model has 19 reactions. ✅

RMS_constantVIdealGasReactor_superminimal Passed Edge Comparison ✅

Original model has 13 species.
Test model has 13 species. ✅
Original model has 19 reactions.
Test model has 19 reactions. ✅

Observables Test Case: RMS_constantVIdealGasReactor_superminimal Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

RMS_constantVIdealGasReactor_superminimal Passed Observable Testing ✅

Regression test RMS_CSTR_liquid_oxidation:

Reference: Execution time (DD:HH:MM:SS): 00:00:05:53
Current: Execution time (DD:HH:MM:SS): 00:00:05:50
Reference: Memory used: 3377.87 MB
Current: Memory used: 3401.14 MB

RMS_CSTR_liquid_oxidation Passed Core Comparison ✅

Original model has 37 species.
Test model has 37 species. ✅
Original model has 232 reactions.
Test model has 232 reactions. ✅

RMS_CSTR_liquid_oxidation Passed Edge Comparison ✅

Original model has 206 species.
Test model has 206 species. ✅
Original model has 1508 reactions.
Test model has 1508 reactions. ✅

Observables Test Case: RMS_CSTR_liquid_oxidation Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

RMS_CSTR_liquid_oxidation Passed Observable Testing ✅

Regression test fragment:

Reference: Execution time (DD:HH:MM:SS): 00:00:00:43
Current: Execution time (DD:HH:MM:SS): 00:00:00:42
Reference: Memory used: 2706.92 MB
Current: Memory used: 2704.90 MB

fragment Passed Core Comparison ✅

Original model has 10 species.
Test model has 10 species. ✅
Original model has 2 reactions.
Test model has 2 reactions. ✅

fragment Passed Edge Comparison ✅

Original model has 33 species.
Test model has 33 species. ✅
Original model has 47 reactions.
Test model has 47 reactions. ✅

Observables Test Case: fragment Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

fragment Passed Observable Testing ✅

Regression test RMS_constantVIdealGasReactor_fragment:

Reference: Execution time (DD:HH:MM:SS): 00:00:03:04
Current: Execution time (DD:HH:MM:SS): 00:00:03:00
Reference: Memory used: 3584.24 MB
Current: Memory used: 3615.22 MB

RMS_constantVIdealGasReactor_fragment Passed Core Comparison ✅

Original model has 10 species.
Test model has 10 species. ✅
Original model has 2 reactions.
Test model has 2 reactions. ✅

RMS_constantVIdealGasReactor_fragment Passed Edge Comparison ✅

Original model has 27 species.
Test model has 27 species. ✅
Original model has 24 reactions.
Test model has 24 reactions. ✅

Observables Test Case: RMS_constantVIdealGasReactor_fragment Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

RMS_constantVIdealGasReactor_fragment Passed Observable Testing ✅

beep boop this comment was written by a bot 🤖

@rwest rwest marked this pull request as draft August 9, 2024 18:07
@rwest
Copy link
Member Author

rwest commented Aug 9, 2024

Chatting with Bjarne, we thought that having * in filenames could be a recipe for disaster (imagine people trying to copy, move, rename, or delete a file....).
So probably this needs more thought.

Nice to have?

  1. round trip so a SMILES string written by RMG can be read by RMG
  2. compatibility with other tools, so a SMILES string written by RMG can be read by other things. (by what? RDKit? what else is used by our users?)
  3. short names. X and * are each 1 character whereas [Pt] is 4
  4. human-friendly; looking at it makes sense and is easy to understand.
  5. file-system friendly. avoid filenames with * in.
  6. not confusing when you change metal. i.e. using [Pt] is hard to explain to people when you're simulating copper, nickel, ruthenium, an arbitrary hypothetical metal, or an alloy.

extend the list?
cross any off as unimportant?
identify if any of these are in direct conflict?
think of some solutions that match as much as possible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants