Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset.from_pandas with a DataFrame of PIL.Images #6288

Open
lhoestq opened this issue Oct 10, 2023 · 3 comments
Open

Dataset.from_pandas with a DataFrame of PIL.Images #6288

lhoestq opened this issue Oct 10, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@lhoestq
Copy link
Member

lhoestq commented Oct 10, 2023

Currently type inference doesn't know what to do with a Pandas Series of PIL.Image objects, though it would be nice to get a Dataset with the Image type this way

@lhoestq lhoestq added the enhancement New feature or request label Oct 10, 2023
@mariosasko
Copy link
Collaborator

A duplicate of #4796.

We could get this for free by implementing the Image feature as an extension type, as shown in this Colab (example with UUIDs).

@ZachNagengast
Copy link
Contributor

+1 to this
Calling this line with a df that contains a PIL image (as they are returned from load_dataset)
ds = Dataset.from_pandas(df)
Results in this error:
ArrowInvalid: ('Could not convert <PIL.PngImagePlugin.PngImageFile image mode=RGB size=1024x1024 at 0x2B41F2D70> with type PngImageFile: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column image with type object')

@juanxo90
Copy link

I found something that can be used as solution.

I have the same problem when I've try to load the images from a pamdas dataset

If you have all on a pandas dataset try
Dataset.from_dict( your_df.reset_index(drop=True).to_dict(orient='list'), split=set_your_split)

And this avoid the error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants