Add `tidy` keyword to to_pandas? #405

lilyminium · 2021-11-16T17:41:13Z

I was surprised that .to_pandas converts to a wide format where each property type gets its own column and imposed unit. I would have thought it more intuitive to convert to a tidier format. i.e.

Instead of:

Index(['Id', 'Temperature (K)', 'Pressure (kPa)', 'Phase', 'N Components',
       'Component 1', 'Role 1', 'Mole Fraction 1', 'Exact Amount 1',
       'Component 2', 'Role 2', 'Mole Fraction 2', 'Exact Amount 2',
       'SolvationFreeEnergy Value (kJ / mol)',
       'SolvationFreeEnergy Uncertainty (kJ / mol)', 'Source'],
      dtype='object')

You could have:

Index(['Id', 'Temperature (K)', 'Pressure (kPa)', 'Phase', 'N Components',
       'Component 1', 'Role 1', 'Mole Fraction 1', 'Exact Amount 1',
       'Component 2', 'Role 2', 'Mole Fraction 2', 'Exact Amount 2',
       'Property type', 'Value', 'Value unit', 'Uncertainty', 'Uncertainty unit', 'Source'],
      dtype='object')

This would be more efficient memory-wise (edit: for mixed datasets), as you no longer have NaNs taking up a bunch of space, as well as help in filtering by property type. When working direclty with the dataframe it would be much easier to see how many of each property type you have and to group by it.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `tidy` keyword to to_pandas? #405

Add `tidy` keyword to to_pandas? #405

lilyminium commented Nov 16, 2021 •

edited

Loading

Add tidy keyword to to_pandas? #405

Add tidy keyword to to_pandas? #405

Comments

lilyminium commented Nov 16, 2021 • edited Loading

Add `tidy` keyword to to_pandas? #405

Add `tidy` keyword to to_pandas? #405

lilyminium commented Nov 16, 2021 •

edited

Loading