-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Pydantic's model_copy
for model modification when updating table metadata
#182
Conversation
# Conflicts: # pyiceberg/table/__init__.py # tests/conftest.py
# Conflicts: # tests/conftest.py
…able_metadata_update_model_copy # Conflicts: # pyiceberg/table/__init__.py
# Conflicts: # pyiceberg/table/__init__.py # tests/table/test_init.py
pyiceberg/table/__init__.py
Outdated
@@ -533,6 +535,8 @@ def update_table_metadata(base_metadata: TableMetadata, updates: Tuple[TableUpda | |||
for update in updates: | |||
new_metadata = _apply_table_update(update, new_metadata, context) | |||
|
|||
# Rebuild metadata to trigger validation | |||
new_metadata = TableMetadataUtil.parse_obj(copy(new_metadata.model_dump())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since model_copy
performs a shallow copy by default, I believe we need to execute a deep copy before returning the final new_metadata
. Otherwise, base_metadata
might be inadvertently altered due to any improper updates applied to new_metadata
subsequently.
Furthermore, as indicated by pydantic/pydantic#418 and by some local tests, model_copy(update=)
does not validate the contents of update. I think it might be good to reconstruct the metadata at this point to initiate the validation process.
(Alternatively, we could perform a deep copy here and incorporate the validation into our unit tests. Open to discussion on this approach.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're okay to skip the validation on the Pydantic side, we should validate the input anyway to generate a more meaningful error.
We could set it to deep-copy using the deep=True
kwarg. Doing this in Pydantic is probably an order of magnitude more efficient than doing this in Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! I changed to model_copy(deep=True)
here and moved the validation to a unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great @HonahX! Thanks for fixing this right away.
One thing I wondered if it is better to do a deep-copy when returning the final metadata, or just deep-copy in the steps where we update the metadata. But I think the current approach is more efficient when there are several updates.
Fixes #179
This PR uses Pydantic's
model_copy
to apply table updates to metadata. Specifically:AddSchema
,SetCurrentSchema
,AddSnapshot
,SetSnapshotRef
, we usebase_metadata.model_copy(update=...)
to get the updated metadata.UpgradeFormatVersion
, we still need to rebuild the whole model fromTableMetadataV1
toTableMetadataV2
@Fokko Could you please take a look at this when you have a chance? Thanks!