Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow hash pandas series which contains numpy.array or list #133

Open
zhu0619 opened this issue Jul 17, 2024 · 1 comment
Open

Allow hash pandas series which contains numpy.array or list #133

zhu0619 opened this issue Jul 17, 2024 · 1 comment
Labels
feature Annotates any PR that adds new features; Used in the release process

Comments

@zhu0619
Copy link
Contributor

zhu0619 commented Jul 17, 2024

Is your feature request related to a problem? Please describe.

Dataset uses pandas.util.hash_pandas_object to compute the checksum.
However, there are cases, the data type of pandas series is a list or a numpy array.
Such as pd.Series([[1], [2], [3]])) produces error TypeError: unhashable type: 'list'

Describe the solution you'd like

A solution to be able to compute the hash for data like pd.Series([[1], [2], [3]])

@zhu0619 zhu0619 added the feature Annotates any PR that adds new features; Used in the release process label Jul 17, 2024
@cwognum
Copy link
Collaborator

cwognum commented Jul 19, 2024

Hey @zhu0619 , could you give an example of a dataset for which you ran into this issue?

It seems to me you could always restructure the dataset such that you don't need to save lists or arrays in Pandas columns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Annotates any PR that adds new features; Used in the release process
Projects
None yet
Development

No branches or pull requests

2 participants