-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Remove deprecated method from BaseMetadataTable #9298
Conversation
fa7c7de
to
5d14bf1
Compare
@@ -105,6 +105,8 @@ private String metadataFileLocation(Table table) { | |||
if (table instanceof HasTableOperations) { | |||
TableOperations ops = ((HasTableOperations) table).operations(); | |||
return ops.current().metadataFileLocation(); | |||
} else if (table instanceof BaseMetadataTable) { | |||
return ((BaseMetadataTable) table).table().operations().current().metadataFileLocation(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed now since the metadata table won't enter above check of HasTableOperations
Looks like some tests are directly casting metadata tables with |
5d14bf1
to
aa1870d
Compare
5d3825c
to
005ecf8
Compare
@@ -948,6 +950,17 @@ public static org.apache.spark.sql.catalyst.TableIdentifier toV1TableIdentifier( | |||
return org.apache.spark.sql.catalyst.TableIdentifier.apply(table, database); | |||
} | |||
|
|||
static String tableUUID(org.apache.iceberg.Table table) { | |||
if (table instanceof HasTableOperations) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just call table.uuid()
in all of those places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because BaseMetadataTable
gives new UUID instead of base table's UUID.
Shall I fix that method to return base table's UUID ?
iceberg/core/src/main/java/org/apache/iceberg/BaseMetadataTable.java
Lines 204 to 206 in d56dd63
public UUID uuid() { | |
return UUID.randomUUID(); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think while adding UUID interface we concluded that we should not use base table's UUID
#8800 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the argument there is that the metadata table can be considered as a separate table and should therefore have it's own unique identifier compared to the base table.
But I think @nastra point still stands, even if it's different then the base table UUID, why does that matter here? I think we just want the table.uuid() right? or do we need the metadata table's underlying table's UUID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using table.uuid(), many testcase failed as the scan task of metadata table expects UUID of the base table not the metadata table.
java.lang.IllegalArgumentException: No scan tasks found for 2c44000a-aa24-479a-8666-292cee70b95f
Does it make sense to return base table's UUID for the metadata table? (That is change the behaviour from #8800?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. Rebased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, looks like SparkStagedScan
is expecting base table's uuid for cache for metadata tables.
Either we need to change that logic or return base table uuid. I will dig deeper next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'd check out that logic further and we can see what the right behavior is here. I still think the change that was made in #9310 is definitely the right fix from an API perspective (even if we decide not to use that API here). The main issue that was solved there was semantically the metadata table UUID should be the same for the same reference.
In other words, imo I would not change the UUID API semantics to fit whatever the caching logic relies on.
If we need the base table UUID for the caching logic, then maybe MetadataTable
specifically can expose another API for exposing the underlying base table's UUID. Or alternatively keep it as is, and just expose the underlying Table (but that seems to expose too much imo).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there is a tight correlation between metadata table scan tasks and main table UUID from multiple classes.
If we need to change it, it can be handled in a separate PR (Issue) as it is nothing to do with this deprecated method removal.
Hence, I went back to reverting using table.uuid()
So, this PR can go ahead.
cc: @nastra, @amogh-jahagirdar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nastra: Thoughts?
0275034
to
a1d74dd
Compare
a1d74dd
to
e1ab527
Compare
e1ab527
to
60d3527
Compare
Just rebased to resolve conflict. |
Sorry for the delay in review on this @ajantha-bhat , I'll take a look at this tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks close @ajantha-bhat just a colmment on the tableUUID
implementation returning null when we don't know what kind of table it is.
I think it's good to preserve the metadataTable UUID behavior and not return the base table. The rationale is that a UUID for a table should be unique per table and metadata tables are no different in this regard.
I think the way that it's implemented in this PR is fine.
} else if (table instanceof BaseMetadataTable) { | ||
return ((BaseMetadataTable) table).table().operations().current().uuid(); | ||
} else { | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should probably throw an exception instead of returning null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also instead of tableUUID
maybe baseTableUUID
sine that's what we're really getting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -72,18 +70,12 @@ public void clearRewrite(Table table, String fileSetId) { | |||
|
|||
public Set<String> fetchSetIds(Table table) { | |||
return resultMap.keySet().stream() | |||
.filter(e -> e.first().equals(tableUUID(table))) | |||
.filter(e -> e.first().equals(Spark3Util.tableUUID(table))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think it would be a bit cleaner just to import Spark3Util.tableUUID and then just use tableUUID here (the diff on line 73 and other places would essentially go away in favor of just a new import statement). But that's nbd, if we do the method rename like I suggested we lose this benefit anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I renamed to baseTableUUID
and I am not sure about the guidelines on static import. Some place we use it and some place we don't. So, I left it as it is.
} else if (table instanceof BaseMetadataTable) { | ||
return ((BaseMetadataTable) table).table().operations().current().uuid(); | ||
} else { | ||
throw new UnsupportedOperationException("Cannot fetch table operations for " + table.name()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be replicated across all Spark versions? Also I would probably update the error msg to Cannot retrieve UUID for table ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Retriggering the build due to flaky test in Flink. |
TableOperations ops = ((HasTableOperations) table).operations(); | ||
return ops.current().uuid(); | ||
} else if (table instanceof BaseMetadataTable) { | ||
return ((BaseMetadataTable) table).table().operations().current().uuid(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call table
on table
looks strange. It would be better to have a method baseTable()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function can be called for main table or metadata table. So, the varibale name is table.
Agree that BaseMetadataTable
can have a public interface as baseTable()
. But current interface table()
is a public interface, we can't rename directly. It has to be deprecated first and new interface.
So, I think we can leave it as it is as of now (out of scope for this PR).
PR is ready. |
ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me now @ajantha-bhat thanks for the follow up. I'll wait for a bit in case others have any comments before merging.
No description provided.