BUG: Parquet does not support saving fp16 GH#44846 #44847

Anirudhsekar96 · 2021-12-10T23:42:25Z

closes BUG: Parquet format does not support saving float16 columns #44846
tests added / passed
Ensure all linting tests pass, see here for how to run them
Adds validation check for float16 column in to_parquet function by raising a ValueError

pep8speaks · 2021-12-10T23:42:27Z

Hello @Anirudhsekar96! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-12-14 04:01:15 UTC

jreback

would need a test and release note

pandas/io/parquet.py

…ion GH#44846

jorisvandenbossche · 2021-12-13T12:43:22Z

Is this explicit check needed? It seems pyarrow itself already gives a decent error message?

In [4]: import pandas as pd
   ...: import numpy as np
   ...: 
   ...: data = np.arange(2, 10, dtype=np.float16)
   ...: df = pd.DataFrame(data=data, columns=['fp16'])
   ...: df.to_parquet('./fp16.parquet')
...
ArrowNotImplementedError: Unhandled type for Arrow to Parquet schema conversion: halffloat

(adding an explicit check also means we wouldn't automatically get the improvement if this is fixed on pyarrow's side)

Anirudhsekar96 · 2021-12-13T18:10:59Z

In the given example, PyArrow creates a file called 'fp16.parquet' before throwing the error resulting in an empty file. An explicit check before passing to PyArrow would prevent the creation of the empty file.

Additionally, it seems that there is little interest in adding support for float16 in PyArrow at the moment (see ref: apache/arrow#2691). Parquet format also does not seem to have adding extensions for supporting float16 in the road map (ref: https://issues.apache.org/jira/browse/ARROW-7242).

Another option would be to coerce all the columns with float16 coerce to float32 and adding a warning indicating the conversion. Currently fastparquet handles fp16 using the conversion method.

jreback

i am ok with this, a partial write is not a great end result here.

could you add a note in I/O section for 1.4.0

pandas/io/parquet.py

pandas/tests/io/test_parquet.py

… name to test_unsupported_fp16

jorisvandenbossche · 2021-12-14T15:08:24Z

In the given example, PyArrow creates a file called 'fp16.parquet' before throwing the error resulting in an empty file. An explicit check before passing to PyArrow would prevent the creation of the empty file.

That might actually be a bug in the pandas code, as I can't reproduce it with just pyarrow (I do reproduce it with the pandas version above):

import pyarrow as pa
import pyarrow.parquet as pq

table = pa.table({'a': np.array([0, 1, 2], dtype="float16")})
pq.write_table(table, "test_halffloat.parquet")

jorisvandenbossche · 2021-12-14T15:13:31Z

So this "creates emtpy file" issue also happens for other data types that aren't supported with Parquet (such as timedelta64, or for any other reason that pyarrow raises an error while writing). I think that is something we should fix in general instead of doing a special case check for float16.

jreback · 2021-12-14T18:40:17Z

good point @jorisvandenbossche

ok let's close this issue and open one for the writing partial files

Anirudhsekar96 · 2021-12-15T21:43:19Z

Opened a new bug report for the broader issue:

#44914

BUG: Parquet does not support saving fp16 GH#44846

220ed2f

BUG: Parquet does not support saving fp16 GH#44846

5b8598e

jreback requested changes Dec 11, 2021

View reviewed changes

pandas/io/parquet.py Outdated Show resolved Hide resolved

Anirudhsekar96 added 2 commits December 10, 2021 17:07

Added tests, changed location of fp16 checking to pyarrow write funct…

f60ef72

…ion GH#44846

Used black formatter GH#44846

54b97d6

Anirudhsekar96 requested a review from jreback December 13, 2021 18:12

jreback requested changes Dec 14, 2021

View reviewed changes

pandas/io/parquet.py Outdated Show resolved Hide resolved

pandas/tests/io/test_parquet.py Outdated Show resolved Hide resolved

jreback added this to the 1.4 milestone Dec 14, 2021

jreback added the IO Parquet parquet, feather label Dec 14, 2021

Anirudhsekar96 added 3 commits December 13, 2021 19:17

Changed format of error message to include column names, changed test…

23b8681

… name to test_unsupported_fp16

Added release note in v1.4.0 io section #44847

c19a9fa

Merge branch 'master' into to_parquet_float16_conversion

3addd76

Anirudhsekar96 requested a review from jreback December 14, 2021 04:56

Anirudhsekar96 mentioned this pull request Dec 15, 2021

BUG: Partial file write to disk on calling to_parquet() with engine='pyarrow' with unsupported dtype #44914

Closed

3 tasks

Anirudhsekar96 closed this Dec 15, 2021

Anirudhsekar96 mentioned this pull request Jan 20, 2022

closes #44914 by changing the path object to string if it is of io.BufferWriter #45480

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Parquet does not support saving fp16 GH#44846 #44847

BUG: Parquet does not support saving fp16 GH#44846 #44847

Uh oh!

Anirudhsekar96 commented Dec 10, 2021

Uh oh!

pep8speaks commented Dec 10, 2021 •

edited

Loading

Uh oh!

jreback left a comment

Uh oh!

Uh oh!

jorisvandenbossche commented Dec 13, 2021

Uh oh!

Anirudhsekar96 commented Dec 13, 2021

Uh oh!

jreback left a comment

Uh oh!

Uh oh!

Uh oh!

jorisvandenbossche commented Dec 14, 2021

Uh oh!

jorisvandenbossche commented Dec 14, 2021

Uh oh!

jreback commented Dec 14, 2021

Uh oh!

Anirudhsekar96 commented Dec 15, 2021

Uh oh!

Uh oh!

Uh oh!

BUG: Parquet does not support saving fp16 GH#44846 #44847

BUG: Parquet does not support saving fp16 GH#44846 #44847

Uh oh!

Conversation

Anirudhsekar96 commented Dec 10, 2021

Uh oh!

pep8speaks commented Dec 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2021-12-14 04:01:15 UTC

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jorisvandenbossche commented Dec 13, 2021

Uh oh!

Anirudhsekar96 commented Dec 13, 2021

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jorisvandenbossche commented Dec 14, 2021

Uh oh!

jorisvandenbossche commented Dec 14, 2021

Uh oh!

jreback commented Dec 14, 2021

Uh oh!

Anirudhsekar96 commented Dec 15, 2021

Uh oh!

Uh oh!

pep8speaks commented Dec 10, 2021 •

edited

Loading