Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

closes #44914 by changing the path object to string if it is of io.BufferWriter #45480

Merged

Conversation

Anirudhsekar96
Copy link
Contributor

Changes the path object in to_parquet to string type if it is io.BufferWriter type. PyArrow does not remove the partial file on disk if the path is not string type.

pandas/io/parquet.py Outdated Show resolved Hide resolved
… for typecasting to string in path object for to_parquet
Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test?

@Anirudhsekar96
Copy link
Contributor Author

I have added two tests:

  • Checks to see if PyArrow raises an error when dtype=fp16
  • Checks to see if PyArrow removes partial files from disk in case of error

@jreback jreback added this to the 1.5 milestone Jan 22, 2022
@jreback jreback added Bug IO Parquet parquet, feather labels Jan 22, 2022
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine if you can move the note & merge master

doc/source/whatsnew/v1.4.0.rst Outdated Show resolved Hide resolved
@jreback jreback merged commit a15ca97 into pandas-dev:main Jan 23, 2022
@jreback
Copy link
Contributor

jreback commented Jan 23, 2022

thanks @Anirudhsekar96

@twoertwein
Copy link
Member

This makes mypy fail:

pandas/io/parquet.py:184: error: "RawIOBase" has no attribute "name" [attr-defined]

@Anirudhsekar96
Copy link
Contributor Author

Anirudhsekar96 commented Jan 23, 2022

I have opened a new pull request for resolving mypy fail (#45570)

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Parquet parquet, feather
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Partial file write to disk on calling to_parquet() with engine='pyarrow' with unsupported dtype
5 participants