to/from_pandas does not roundtrip #365

lilyminium · 2021-08-24T18:48:09Z

I cannot create a ThermoMLDataSet from a pandas dataframe that was created from a dataset.

>>> df = dataset.to_pandas()
>>> ThermoMLDataSet.from_pandas(df)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/var/folders/rv/j6lbln6j0kvb5svxj8wflc400000gn/T/ipykernel_34462/444742739.py in <module>
      1 df = dataset.to_pandas()
----> 2 ThermoMLDataSet.from_pandas(df)

~/anaconda3/envs/polymetrizer/lib/python3.9/site-packages/openff/evaluator/datasets/datasets.py in from_pandas(cls, data_frame)
    555         for match in property_header_matches:
    556 
--> 557             assert match
    558 
    559             property_type_string, property_unit_string = match.groups()

AssertionError:

Diagnostics

It dies on matching ExcessMolarVolume Value (cm ** 3 / mol) because the match pattern does not have asterisks.

>>> import re
>>> property_header_matches = {
            (header, re.match(r"^([a-zA-Z]+) Value \(([a-zA-Z0-9+-/\s]*)\)$", header))
            for header in df
            if header.find(" Value ") >= 0
        }
>>> property_header_matches
{('Density Value (g / ml)',
  <re.Match object; span=(0, 22), match='Density Value (g / ml)'>),
 ('DielectricConstant Value ()',
  <re.Match object; span=(0, 27), match='DielectricConstant Value ()'>),
 ('EnthalpyOfMixing Value (kJ / mol)',
  <re.Match object; span=(0, 33), match='EnthalpyOfMixing Value (kJ / mol)'>),
 ('EnthalpyOfVaporization Value (kJ / mol)',
  <re.Match object; span=(0, 39), match='EnthalpyOfVaporization Value (kJ / mol)'>),
 ('ExcessMolarVolume Value (cm ** 3 / mol)', None)}

Suggestion

        property_header_matches = {
---            re.match(r"^([a-zA-Z]+) Value \(([a-zA-Z0-9+-/\s]*)\)$", header)
+++            re.match(r"^([a-zA-Z]+) Value \(([a-zA-Z0-9+*-/\s]*)\)$", header)
            for header in data_frame
            if header.find(" Value ") >= 0
        }

Or get rid of the check altogether, as new exciting units arise. (I notice no allowance for exponents, for example, even though kJ/mol and kJ mol^-1 should be equivalent.)

The text was updated successfully, but these errors were encountered:

SimonBoothroyd added a commit that referenced this issue Aug 25, 2021

Fix #365

645d724

SimonBoothroyd mentioned this issue Aug 25, 2021

Fix #365 #367

Merged

1 task

SimonBoothroyd closed this as completed in #367 Aug 25, 2021

SimonBoothroyd added a commit that referenced this issue Aug 25, 2021

Fix #365 (#367)

976cf1e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to/from_pandas does not roundtrip #365

to/from_pandas does not roundtrip #365

lilyminium commented Aug 24, 2021

to/from_pandas does not roundtrip #365

to/from_pandas does not roundtrip #365

Comments

lilyminium commented Aug 24, 2021

Diagnostics

Suggestion