Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrow: Set field-id with prefix #227

Merged
merged 8 commits into from
Jan 12, 2024

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Dec 19, 2023

While working on the write support, I started to understand the meaning of the Arrow metadata. The PARQUET: prefix means that it is specific to Parquet, and setting PARQUET:field_id will actually not be set as metadata, but as the field_id. I think we should remove the other option without the prefix.

I also did the same for the doc. I checked, in Java we don't check for the documentation, but I think it is a nice addition. However, I think we should only check for the key that's the same as the Iceberg field (doc).

While working on the write support, I started to understand the
meaning of the Arrow metadata. The `PARQUET:` prefix means that
it is specific to Parquet, and setting `PARQUET:field_id` will
actually not be set as metadata, but as the `field_id`. I think
we should remove the other option without the prefix.

I also did the same for the doc. I checked, in Java we don't check
for the documentation, but I think it is a nice addition. However,
I think we should only check for the key that's the same as the
Iceberg field (`doc`).
Copy link
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Fokko for the fix and @jqin61 for flagging this! This really clears up my confusion about how Arrow manages its field metadata. I have some comments below.

Copy link
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just have one small comment

@HonahX HonahX mentioned this pull request Jan 6, 2024
@HonahX HonahX merged commit 4593208 into apache:main Jan 12, 2024
6 checks passed
@Fokko Fokko deleted the fd-set-field-id-with-prefix branch January 12, 2024 22:44
sungwy pushed a commit to sungwy/iceberg-python that referenced this pull request Jan 13, 2024
'PARQUET:' prefix is specific to Parquet, with 'PARQUET:field_id' setting the 'field_id'. Removed the non-prefixed alternative for 'field_id'. Removed the prefixed alternative for 'doc'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants