-
Notifications
You must be signed in to change notification settings - Fork 268
Set field-id when needed #1867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set field-id when needed #1867
Conversation
@@ -1777,7 +1777,7 @@ def struct( | |||
field_arrays.append(array) | |||
fields.append(self._construct_field(field, array.type)) | |||
elif field.optional: | |||
arrow_type = schema_to_pyarrow(field.field_type, include_field_ids=False) | |||
arrow_type = schema_to_pyarrow(field.field_type, include_field_ids=self._include_field_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah!! i think this is the right approach
i tried to resolve it by passing the schema's name mapping
https://github.com/apache/iceberg-python/compare/main...kevinjqliu:iceberg-python:kevinjqliu/use-name-mapping?expand=1#diff-8d5e63f2a87ead8cebe2fd8ac5dcf2198d229f01e16bb9e06e21f7277c328abdR1655
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we missed this in a review 🤦 Field-IDs are superior over name-mapping, for example: dropping a field, and then adding a new field with the same name is not supported by name-mapping because it re-uses the name. In the case of field-IDs, a new ID is assigned and it will look like a new column 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found 3 other places where include_field_ids=False
- in
to_table
, this is fine since we're just materializing the table from record batches - in
_to_requested_schema
, the 2 places where_to_requested_schema
is called setsinclude_field_ids=True
(1, 2) - in
ArrowProjectionVisitor
, but this is only called here get uses theinclude_field_ids
from_to_requested_schema
, which setsinclude_field_ids=True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes #1798 <!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change # Are these changes tested? # Are there any user-facing changes? <!-- In the case of user-facing changes, please add the changelog label. -->
Fixes #1798
Rationale for this change
Are these changes tested?
Are there any user-facing changes?