Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Remove deprecated method from BaseMetadataTable #9298
Core: Remove deprecated method from BaseMetadataTable #9298
Changes from all commits
50b23c9
e13e6d2
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed now since the metadata table won't enter above check of
HasTableOperations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just call
table.uuid()
in all of those places?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because
BaseMetadataTable
gives new UUID instead of base table's UUID.Shall I fix that method to return base table's UUID ?
iceberg/core/src/main/java/org/apache/iceberg/BaseMetadataTable.java
Lines 204 to 206 in d56dd63
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think while adding UUID interface we concluded that we should not use base table's UUID
#8800 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the argument there is that the metadata table can be considered as a separate table and should therefore have it's own unique identifier compared to the base table.
But I think @nastra point still stands, even if it's different then the base table UUID, why does that matter here? I think we just want the table.uuid() right? or do we need the metadata table's underlying table's UUID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using table.uuid(), many testcase failed as the scan task of metadata table expects UUID of the base table not the metadata table.
java.lang.IllegalArgumentException: No scan tasks found for 2c44000a-aa24-479a-8666-292cee70b95f
Does it make sense to return base table's UUID for the metadata table? (That is change the behaviour from #8800?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. Rebased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, looks like
SparkStagedScan
is expecting base table's uuid for cache for metadata tables.Either we need to change that logic or return base table uuid. I will dig deeper next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'd check out that logic further and we can see what the right behavior is here. I still think the change that was made in #9310 is definitely the right fix from an API perspective (even if we decide not to use that API here). The main issue that was solved there was semantically the metadata table UUID should be the same for the same reference.
In other words, imo I would not change the UUID API semantics to fit whatever the caching logic relies on.
If we need the base table UUID for the caching logic, then maybe
MetadataTable
specifically can expose another API for exposing the underlying base table's UUID. Or alternatively keep it as is, and just expose the underlying Table (but that seems to expose too much imo).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there is a tight correlation between metadata table scan tasks and main table UUID from multiple classes.
If we need to change it, it can be handled in a separate PR (Issue) as it is nothing to do with this deprecated method removal.
Hence, I went back to reverting using table.uuid()
So, this PR can go ahead.
cc: @nastra, @amogh-jahagirdar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nastra: Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call
table
ontable
looks strange. It would be better to have a methodbaseTable()
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function can be called for main table or metadata table. So, the varibale name is table.
Agree that
BaseMetadataTable
can have a public interface asbaseTable()
. But current interfacetable()
is a public interface, we can't rename directly. It has to be deprecated first and new interface.So, I think we can leave it as it is as of now (out of scope for this PR).