-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python][Parquet] pa.schema() silently drops metadata if a schema object is passed #38575
Comments
igozali
changed the title
pa.schema() silently drops metadata if a schema object is passed
[Python][Parquet] pa.schema() silently drops metadata if a schema object is passed
Nov 3, 2023
I came across this too - it also appears to also happen if you pass a $ python3
Python 3.12.2 (main, May 3 2024, 17:50:09) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow as pa
>>> pa.__version__
'16.0.0'
>>> pa.schema([pa.field("test", pa.int64())], metadata={"a": "b"})
test: int64
-- schema metadata --
a: 'b'
>>> pa.schema(pa.struct([pa.field("test", pa.int64())]), metadata={"a": "b"})
test: int64 |
I think it was introduced in #37797, which short-circuits the creation without setting the metadata: arrow/python/pyarrow/types.pxi Lines 5093 to 5094 in d68f8e2
|
JacobHayes
added a commit
to JacobHayes/arrow
that referenced
this issue
May 4, 2024
JacobHayes
added a commit
to JacobHayes/arrow
that referenced
this issue
May 4, 2024
… to check metadata
JacobHayes
added a commit
to JacobHayes/arrow
that referenced
this issue
May 4, 2024
JacobHayes
added a commit
to JacobHayes/arrow
that referenced
this issue
May 6, 2024
… to check metadata
JacobHayes
added a commit
to JacobHayes/arrow
that referenced
this issue
May 6, 2024
JacobHayes
added a commit
to JacobHayes/arrow
that referenced
this issue
May 7, 2024
… to check metadata
JacobHayes
added a commit
to JacobHayes/arrow
that referenced
this issue
May 7, 2024
jorisvandenbossche
added a commit
that referenced
this issue
May 17, 2024
…psule (#41538) ### Rationale for this change Fixes the dropped `pa.schema` metadata reported in #38575, which was introduced in #37797. ### What changes are included in this PR? Passes through the `metadata` to the short-circuited `Schema` created with `_import_from_c_capsule`. ### Are these changes tested? Yes - added `metadata` to the existing test. ### Are there any user-facing changes? I'm not sure this quite rises to the `(b) a bug that caused incorrect or invalid data to be produced,` condition, but I added that note to be safe since the resulting schema is "incorrect" (and broke some round-trip tests on my end after a pyarrow update): **This PR contains a "Critical Fix".** * GitHub Issue: #38575 Lead-authored-by: Jacob Hayes <jacob.r.hayes@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Issue resolved by pull request 41538 |
vibhatha
pushed a commit
to vibhatha/arrow
that referenced
this issue
May 25, 2024
…m PyCapsule (apache#41538) ### Rationale for this change Fixes the dropped `pa.schema` metadata reported in apache#38575, which was introduced in apache#37797. ### What changes are included in this PR? Passes through the `metadata` to the short-circuited `Schema` created with `_import_from_c_capsule`. ### Are these changes tested? Yes - added `metadata` to the existing test. ### Are there any user-facing changes? I'm not sure this quite rises to the `(b) a bug that caused incorrect or invalid data to be produced,` condition, but I added that note to be safe since the resulting schema is "incorrect" (and broke some round-trip tests on my end after a pyarrow update): **This PR contains a "Critical Fix".** * GitHub Issue: apache#38575 Lead-authored-by: Jacob Hayes <jacob.r.hayes@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug, including details regarding any error messages, version, and platform.
Seems like a regression from arrow 13 -> 14
Here's a simple repro script
Running above script on pyarrow 13 and 14 gives these outputs:
Few workarounds:
schema.with_metadata()
instead ofpa.schema()
pa.schema(old_schema, metadata=...)
, passing a list of fields seems to fix it too e.g.pa.schema(list(old_schema), metadata=...)
Component(s)
Parquet, Python
The text was updated successfully, but these errors were encountered: