deephaven.pandas.to_pandas does not properly convert string columns #4810

chipkent · 2023-11-10T17:49:04Z

A user reported problems with doing round trips between DH and pandas when column types are strings. I have confirmed that DH is not properly setting string column types in pandas tables. See below:

from deephaven import empty_table

a=empty_table(5).update_view('A=`value`')

meta1 = a.meta_table # java.lang.String

 
df = pd.DataFrame({'a':['a', 'b', 'c']})

print(df.dtypes)

 

from deephaven.pandas import to_table

t = to_table(df)

meta2 = t.meta_table # PyObject


# See https://pandas.pydata.org/docs/user_guide/text.html

import pandas as pd
s_a = pd.Series(["a", "b", "c"])
s_b = pd.Series(["a", "b", "c"], dtype="string")
print(s_a)
print(s_b)
df2 = pd.DataFrame({'a':s_a, 'b':s_b})
print(df2.dtypes)
t2 = to_table(df2)
meta3 = t2.meta_table

# Now look at the original case with a twist
df['b'] = df['a'].astype("string")

print(df.dtypes)
t3 = to_table(df)
meta4 = t3.meta_table

# Test DH to_pandas conversion
from deephaven.pandas import to_pandas
df_a = to_pandas(a)
print(df_a.dtypes)

chipkent · 2023-11-10T19:49:18Z

As a workaround, to_pandas(a, dtype_backend="numpy_nullable") can be used.

jmao-denver · 2023-11-10T20:50:31Z

dtype_backend is a feature introduced only recently in Pandas 2.0. (the latest is 2.1.2) and all of our paying customers are already on 2.0+. To avoid such confusion in the future and to meet the users' expectation of data round-tripping nicely between Pandas and Deephaven, we should default this parameter to numpy_nullable. It should not be a breaking change, although there might be a performance penalty, but the benefits of clean null conversion, precise type mapping clearly outweigh here.

chipkent added bug Something isn't working triage python python-server-side devrel-watch DevRel team is watching labels Nov 10, 2023

chipkent added this to the November 2023 milestone Nov 10, 2023

chipkent assigned jmao-denver Nov 10, 2023

jmao-denver removed the triage label Nov 10, 2023

jmao-denver mentioned this issue Nov 12, 2023

change default dtype_backend for to_pandas #4815

Merged

jmao-denver closed this as completed in #4815 Nov 14, 2023

chipkent removed the devrel-watch DevRel team is watching label Nov 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deephaven.pandas.to_pandas does not properly convert string columns #4810

deephaven.pandas.to_pandas does not properly convert string columns #4810

chipkent commented Nov 10, 2023

chipkent commented Nov 10, 2023

jmao-denver commented Nov 10, 2023

deephaven.pandas.to_pandas does not properly convert string columns #4810

deephaven.pandas.to_pandas does not properly convert string columns #4810

Comments

chipkent commented Nov 10, 2023

chipkent commented Nov 10, 2023

jmao-denver commented Nov 10, 2023