Skip to content

Set field-id when needed #1867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 1, 2025
Merged

Set field-id when needed #1867

merged 2 commits into from
Apr 1, 2025

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Mar 31, 2025

Fixes #1798

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

@Fokko Fokko added this to the PyIceberg 0.9.1 milestone Mar 31, 2025
@@ -1777,7 +1777,7 @@ def struct(
field_arrays.append(array)
fields.append(self._construct_field(field, array.type))
elif field.optional:
arrow_type = schema_to_pyarrow(field.field_type, include_field_ids=False)
arrow_type = schema_to_pyarrow(field.field_type, include_field_ids=self._include_field_ids)
Copy link
Contributor

@kevinjqliu kevinjqliu Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we missed this in a review 🤦 Field-IDs are superior over name-mapping, for example: dropping a field, and then adding a new field with the same name is not supported by name-mapping because it re-uses the name. In the case of field-IDs, a new ID is assigned and it will look like a new column 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found 3 other places where include_field_ids=False

  • in to_table, this is fine since we're just materializing the table from record batches
  • in _to_requested_schema, the 2 places where _to_requested_schema is called sets include_field_ids=True (1, 2)
  • in ArrowProjectionVisitor , but this is only called here get uses the include_field_ids from _to_requested_schema, which sets include_field_ids=True

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Fokko Fokko merged commit 3d08776 into apache:main Apr 1, 2025
7 checks passed
@Fokko Fokko deleted the fd-fix branch April 17, 2025 12:37
Fokko added a commit that referenced this pull request Apr 17, 2025
Fixes #1798

<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change

# Are these changes tested?

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error reading table after appending pyarrow table
2 participants