Use Pydantic's `model_copy` for model modification when updating table metadata #182

HonahX · 2023-12-05T05:22:29Z

Fixes #179

This PR uses Pydantic's model_copy to apply table updates to metadata. Specifically:

When applying AddSchema, SetCurrentSchema, AddSnapshot, SetSnapshotRef, we use base_metadata.model_copy(update=...) to get the updated metadata.
When applying UpgradeFormatVersion, we still need to rebuild the whole model from TableMetadataV1 to TableMetadataV2

@Fokko Could you please take a look at this when you have a chance? Thanks!

# Conflicts: # pyiceberg/table/__init__.py # tests/conftest.py

…hanges

# Conflicts: # tests/conftest.py

…able_metadata_update_model_copy # Conflicts: # pyiceberg/table/__init__.py

# Conflicts: # pyiceberg/table/__init__.py # tests/table/test_init.py

HonahX · 2023-12-05T05:28:51Z

pyiceberg/table/__init__.py

@@ -533,6 +535,8 @@ def update_table_metadata(base_metadata: TableMetadata, updates: Tuple[TableUpda
    for update in updates:
        new_metadata = _apply_table_update(update, new_metadata, context)

+    # Rebuild metadata to trigger validation
+    new_metadata = TableMetadataUtil.parse_obj(copy(new_metadata.model_dump()))


Since model_copy performs a shallow copy by default, I believe we need to execute a deep copy before returning the final new_metadata. Otherwise, base_metadata might be inadvertently altered due to any improper updates applied to new_metadata subsequently.

Furthermore, as indicated by pydantic/pydantic#418 and by some local tests, model_copy(update=) does not validate the contents of update. I think it might be good to reconstruct the metadata at this point to initiate the validation process.

(Alternatively, we could perform a deep copy here and incorporate the validation into our unit tests. Open to discussion on this approach.)

I think we're okay to skip the validation on the Pydantic side, we should validate the input anyway to generate a more meaningful error.

We could set it to deep-copy using the deep=True kwarg. Doing this in Pydantic is probably an order of magnitude more efficient than doing this in Python.

Thanks for the explanation! I changed to model_copy(deep=True) here and moved the validation to a unit test

Fokko

This is great @HonahX! Thanks for fixing this right away.

One thing I wondered if it is better to do a deep-copy when returning the final metadata, or just deep-copy in the steps where we update the metadata. But I think the current approach is more efficient when there are several updates.

HonahX added 29 commits November 5, 2023 00:32

Implement table metadata updater first draft

d53785a

fix updater error and add tests

274b91b

implement apply_metadata_update which is simpler

c3e1311

remove old implementation

2b7a7d1

re-organize method place

4fc25df

fix nit

facb43b

fix test

116c6fd

add another test

66a4f46

clear TODO

2882d0d

add a combined test

8a8d4ff

Merge remote-tracking branch 'origin/main' into table_metadata_update

70b64d8

# Conflicts: # pyiceberg/table/__init__.py # tests/conftest.py

Fix merge conflict

1cfe9d2

remove table requirement validation for PR simplification

8476d9b

make context private and solve elif issue

77c198c

remove private field access

be482ca

push snapshot ref validation to its builder using pydantic

e2b085d

fix comment

965b16d

remove unnecessary code for AddSchemaUpdate update

53efa28

replace if with elif

b7fd063

switch to model_copy()

3d3122c

enhance the set current schema update implementation and some other c…

bedd0cc

…hanges

Merge branch 'main' into table_metadata_update_pr

121b8b4

# Conflicts: # tests/conftest.py

make apply_table_update private

aecc7c1

Merge remote-tracking branch 'origin/table_metadata_update_pr' into t…

3f7fee7

…able_metadata_update_model_copy # Conflicts: # pyiceberg/table/__init__.py

fix lint after merge

92a5885

add validation

d26fa9a

Merge branch 'main' into table_metadata_update_model_copy

ae1cacb

# Conflicts: # pyiceberg/table/__init__.py # tests/table/test_init.py

add test for isolation of illegal updates

e9a6718

fix nit

12cba1f

HonahX commented Dec 5, 2023

View reviewed changes

HonahX added 2 commits December 4, 2023 21:38

remove unnecessary flag

6b16155

change to model_copy(deep=True)

5de0de5

Fokko approved these changes Dec 6, 2023

View reviewed changes

Fokko merged commit a368bd9 into apache:main Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Pydantic's `model_copy` for model modification when updating table metadata #182

Use Pydantic's `model_copy` for model modification when updating table metadata #182

HonahX commented Dec 5, 2023 •

edited

Loading

HonahX Dec 5, 2023

Fokko Dec 5, 2023

HonahX Dec 6, 2023

Fokko left a comment

Use Pydantic's model_copy for model modification when updating table metadata #182

Use Pydantic's model_copy for model modification when updating table metadata #182

Conversation

HonahX commented Dec 5, 2023 • edited Loading

HonahX Dec 5, 2023

Choose a reason for hiding this comment

Fokko Dec 5, 2023

Choose a reason for hiding this comment

HonahX Dec 6, 2023

Choose a reason for hiding this comment

Fokko left a comment

Choose a reason for hiding this comment

Use Pydantic's `model_copy` for model modification when updating table metadata #182

Use Pydantic's `model_copy` for model modification when updating table metadata #182

HonahX commented Dec 5, 2023 •

edited

Loading