Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Remove deprecated method from BaseMetadataTable #9298

Merged
merged 2 commits into from
Jan 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions .palantir/revapi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -874,6 +874,74 @@ acceptedBreaks:
justification: "Static utility class - should not have public constructor"
"1.4.0":
org.apache.iceberg:iceberg-core:
- code: "java.class.noLongerImplementsInterface"
old: "class org.apache.iceberg.AllDataFilesTable"
new: "class org.apache.iceberg.AllDataFilesTable"
justification: "Removing deprecated code"
- code: "java.class.noLongerImplementsInterface"
old: "class org.apache.iceberg.AllDeleteFilesTable"
new: "class org.apache.iceberg.AllDeleteFilesTable"
justification: "Removing deprecated code"
- code: "java.class.noLongerImplementsInterface"
old: "class org.apache.iceberg.AllEntriesTable"
new: "class org.apache.iceberg.AllEntriesTable"
justification: "Removing deprecated code"
- code: "java.class.noLongerImplementsInterface"
old: "class org.apache.iceberg.AllFilesTable"
new: "class org.apache.iceberg.AllFilesTable"
justification: "Removing deprecated code"
- code: "java.class.noLongerImplementsInterface"
old: "class org.apache.iceberg.AllManifestsTable"
new: "class org.apache.iceberg.AllManifestsTable"
justification: "Removing deprecated code"
- code: "java.class.noLongerImplementsInterface"
old: "class org.apache.iceberg.BaseMetadataTable"
new: "class org.apache.iceberg.BaseMetadataTable"
justification: "Removing deprecated code"
- code: "java.class.noLongerImplementsInterface"
old: "class org.apache.iceberg.DataFilesTable"
new: "class org.apache.iceberg.DataFilesTable"
justification: "Removing deprecated code"
- code: "java.class.noLongerImplementsInterface"
old: "class org.apache.iceberg.DeleteFilesTable"
new: "class org.apache.iceberg.DeleteFilesTable"
justification: "Removing deprecated code"
- code: "java.class.noLongerImplementsInterface"
old: "class org.apache.iceberg.FilesTable"
new: "class org.apache.iceberg.FilesTable"
justification: "Removing deprecated code"
- code: "java.class.noLongerImplementsInterface"
old: "class org.apache.iceberg.HistoryTable"
new: "class org.apache.iceberg.HistoryTable"
justification: "Removing deprecated code"
- code: "java.class.noLongerImplementsInterface"
old: "class org.apache.iceberg.ManifestEntriesTable"
new: "class org.apache.iceberg.ManifestEntriesTable"
justification: "Removing deprecated code"
- code: "java.class.noLongerImplementsInterface"
old: "class org.apache.iceberg.ManifestsTable"
new: "class org.apache.iceberg.ManifestsTable"
justification: "Removing deprecated code"
- code: "java.class.noLongerImplementsInterface"
old: "class org.apache.iceberg.MetadataLogEntriesTable"
new: "class org.apache.iceberg.MetadataLogEntriesTable"
justification: "Removing deprecated code"
- code: "java.class.noLongerImplementsInterface"
old: "class org.apache.iceberg.PartitionsTable"
new: "class org.apache.iceberg.PartitionsTable"
justification: "Removing deprecated code"
- code: "java.class.noLongerImplementsInterface"
old: "class org.apache.iceberg.PositionDeletesTable"
new: "class org.apache.iceberg.PositionDeletesTable"
justification: "Removing deprecated code"
- code: "java.class.noLongerImplementsInterface"
old: "class org.apache.iceberg.RefsTable"
new: "class org.apache.iceberg.RefsTable"
justification: "Removing deprecated code"
- code: "java.class.noLongerImplementsInterface"
old: "class org.apache.iceberg.SnapshotsTable"
new: "class org.apache.iceberg.SnapshotsTable"
justification: "Removing deprecated code"
- code: "java.class.defaultSerializationChanged"
old: "class org.apache.iceberg.mapping.NameMapping"
new: "class org.apache.iceberg.mapping.NameMapping"
Expand All @@ -890,6 +958,9 @@ acceptedBreaks:
- code: "java.field.serialVersionUIDChanged"
new: "field org.apache.iceberg.util.SerializableMap<K, V>.serialVersionUID"
justification: "Serialization is not be used"
- code: "java.method.removed"
old: "method org.apache.iceberg.TableOperations org.apache.iceberg.BaseMetadataTable::operations()"
justification: "Removing deprecated code"
apache-iceberg-0.14.0:
org.apache.iceberg:iceberg-api:
- code: "java.class.defaultSerializationChanged"
Expand Down
12 changes: 2 additions & 10 deletions core/src/main/java/org/apache/iceberg/BaseMetadataTable.java
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,7 @@
* the metadata table using a {@link StaticTableOperations}. This way no Catalog related calls are
* needed when reading the table data after deserialization.
*/
public abstract class BaseMetadataTable extends BaseReadOnlyTable
implements HasTableOperations, Serializable {
public abstract class BaseMetadataTable extends BaseReadOnlyTable implements Serializable {
private final PartitionSpec spec = PartitionSpec.unpartitioned();
private final SortOrder sortOrder = SortOrder.unsorted();
private final BaseTable table;
Expand Down Expand Up @@ -101,17 +100,10 @@ static Map<Integer, PartitionSpec> transformSpecs(

abstract MetadataTableType metadataTableType();

protected BaseTable table() {
public BaseTable table() {
ajantha-bhat marked this conversation as resolved.
Show resolved Hide resolved
return table;
}

/** @deprecated will be removed in 1.4.0; do not use metadata table TableOperations */
@Override
@Deprecated
public TableOperations operations() {
return table.operations();
}

@Override
public String name() {
return name;
Expand Down
2 changes: 2 additions & 0 deletions core/src/main/java/org/apache/iceberg/SerializableTable.java
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,8 @@ private String metadataFileLocation(Table table) {
if (table instanceof HasTableOperations) {
TableOperations ops = ((HasTableOperations) table).operations();
return ops.current().metadataFileLocation();
} else if (table instanceof BaseMetadataTable) {
return ((BaseMetadataTable) table).table().operations().current().metadataFileLocation();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed now since the metadata table won't enter above check of HasTableOperations

} else {
return null;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,7 @@
import java.util.Set;
import java.util.stream.Collectors;
import org.apache.iceberg.ContentFile;
import org.apache.iceberg.HasTableOperations;
import org.apache.iceberg.Table;
import org.apache.iceberg.TableOperations;
import org.apache.iceberg.exceptions.ValidationException;
import org.apache.iceberg.relocated.com.google.common.collect.Maps;
import org.apache.iceberg.util.Pair;
Expand Down Expand Up @@ -72,18 +70,12 @@ public void clearRewrite(Table table, String fileSetId) {

public Set<String> fetchSetIds(Table table) {
return resultMap.keySet().stream()
.filter(e -> e.first().equals(tableUUID(table)))
.filter(e -> e.first().equals(Spark3Util.baseTableUUID(table)))
.map(Pair::second)
.collect(Collectors.toSet());
}

private Pair<String, String> toId(Table table, String setId) {
String tableUUID = tableUUID(table);
return Pair.of(tableUUID, setId);
}

private String tableUUID(Table table) {
TableOperations ops = ((HasTableOperations) table).operations();
return ops.current().uuid();
return Pair.of(Spark3Util.baseTableUUID(table), setId);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,8 @@
import java.util.Map;
import java.util.Set;
import java.util.stream.Collectors;
import org.apache.iceberg.HasTableOperations;
import org.apache.iceberg.ScanTask;
import org.apache.iceberg.Table;
import org.apache.iceberg.TableOperations;
import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
import org.apache.iceberg.relocated.com.google.common.collect.Maps;
import org.apache.iceberg.util.Pair;
Expand Down Expand Up @@ -64,17 +62,12 @@ public <T extends ScanTask> List<T> removeTasks(Table table, String setId) {

public Set<String> fetchSetIds(Table table) {
return tasksMap.keySet().stream()
.filter(e -> e.first().equals(tableUUID(table)))
.filter(e -> e.first().equals(Spark3Util.baseTableUUID(table)))
.map(Pair::second)
.collect(Collectors.toSet());
}

private String tableUUID(Table table) {
TableOperations ops = ((HasTableOperations) table).operations();
return ops.current().uuid();
}

private Pair<String, String> toId(Table table, String setId) {
return Pair.of(tableUUID(table), setId);
return Pair.of(Spark3Util.baseTableUUID(table), setId);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@
import java.util.stream.Stream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.iceberg.BaseMetadataTable;
import org.apache.iceberg.HasTableOperations;
import org.apache.iceberg.NullOrder;
import org.apache.iceberg.PartitionField;
import org.apache.iceberg.PartitionSpec;
Expand Down Expand Up @@ -945,6 +947,17 @@ public static org.apache.spark.sql.catalyst.TableIdentifier toV1TableIdentifier(
return org.apache.spark.sql.catalyst.TableIdentifier.apply(table, database);
}

static String baseTableUUID(org.apache.iceberg.Table table) {
if (table instanceof HasTableOperations) {
TableOperations ops = ((HasTableOperations) table).operations();
return ops.current().uuid();
} else if (table instanceof BaseMetadataTable) {
return ((BaseMetadataTable) table).table().operations().current().uuid();
} else {
throw new UnsupportedOperationException("Cannot retrieve UUID for table " + table.name());
}
}

private static class DescribeSortOrderVisitor implements SortOrderVisitor<String> {
private static final DescribeSortOrderVisitor INSTANCE = new DescribeSortOrderVisitor();

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,7 @@
import java.util.Set;
import java.util.stream.Collectors;
import org.apache.iceberg.ContentFile;
import org.apache.iceberg.HasTableOperations;
import org.apache.iceberg.Table;
import org.apache.iceberg.TableOperations;
import org.apache.iceberg.exceptions.ValidationException;
import org.apache.iceberg.relocated.com.google.common.collect.Maps;
import org.apache.iceberg.util.Pair;
Expand Down Expand Up @@ -72,18 +70,12 @@ public void clearRewrite(Table table, String fileSetId) {

public Set<String> fetchSetIds(Table table) {
return resultMap.keySet().stream()
.filter(e -> e.first().equals(tableUUID(table)))
.filter(e -> e.first().equals(Spark3Util.baseTableUUID(table)))
.map(Pair::second)
.collect(Collectors.toSet());
}

private Pair<String, String> toId(Table table, String setId) {
String tableUUID = tableUUID(table);
return Pair.of(tableUUID, setId);
}

private String tableUUID(Table table) {
TableOperations ops = ((HasTableOperations) table).operations();
return ops.current().uuid();
return Pair.of(Spark3Util.baseTableUUID(table), setId);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,8 @@
import java.util.Map;
import java.util.Set;
import java.util.stream.Collectors;
import org.apache.iceberg.HasTableOperations;
import org.apache.iceberg.ScanTask;
import org.apache.iceberg.Table;
import org.apache.iceberg.TableOperations;
import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
import org.apache.iceberg.relocated.com.google.common.collect.Maps;
import org.apache.iceberg.util.Pair;
Expand Down Expand Up @@ -64,17 +62,12 @@ public <T extends ScanTask> List<T> removeTasks(Table table, String setId) {

public Set<String> fetchSetIds(Table table) {
return tasksMap.keySet().stream()
.filter(e -> e.first().equals(tableUUID(table)))
.filter(e -> e.first().equals(Spark3Util.baseTableUUID(table)))
.map(Pair::second)
.collect(Collectors.toSet());
}

private String tableUUID(Table table) {
TableOperations ops = ((HasTableOperations) table).operations();
return ops.current().uuid();
}

private Pair<String, String> toId(Table table, String setId) {
return Pair.of(tableUUID(table), setId);
return Pair.of(Spark3Util.baseTableUUID(table), setId);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@
import java.util.stream.Stream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.iceberg.BaseMetadataTable;
import org.apache.iceberg.HasTableOperations;
import org.apache.iceberg.NullOrder;
import org.apache.iceberg.PartitionField;
import org.apache.iceberg.PartitionSpec;
Expand Down Expand Up @@ -948,6 +950,17 @@ public static org.apache.spark.sql.catalyst.TableIdentifier toV1TableIdentifier(
return org.apache.spark.sql.catalyst.TableIdentifier.apply(table, database);
}

static String baseTableUUID(org.apache.iceberg.Table table) {
if (table instanceof HasTableOperations) {
TableOperations ops = ((HasTableOperations) table).operations();
return ops.current().uuid();
} else if (table instanceof BaseMetadataTable) {
return ((BaseMetadataTable) table).table().operations().current().uuid();
} else {
throw new UnsupportedOperationException("Cannot retrieve UUID for table " + table.name());
}
}

private static class DescribeSortOrderVisitor implements SortOrderVisitor<String> {
private static final DescribeSortOrderVisitor INSTANCE = new DescribeSortOrderVisitor();

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,7 @@
import java.util.Set;
import java.util.stream.Collectors;
import org.apache.iceberg.ContentFile;
import org.apache.iceberg.HasTableOperations;
import org.apache.iceberg.Table;
import org.apache.iceberg.TableOperations;
import org.apache.iceberg.exceptions.ValidationException;
import org.apache.iceberg.relocated.com.google.common.collect.Maps;
import org.apache.iceberg.util.Pair;
Expand Down Expand Up @@ -72,18 +70,12 @@ public void clearRewrite(Table table, String fileSetId) {

public Set<String> fetchSetIds(Table table) {
return resultMap.keySet().stream()
.filter(e -> e.first().equals(tableUUID(table)))
.filter(e -> e.first().equals(Spark3Util.baseTableUUID(table)))
.map(Pair::second)
.collect(Collectors.toSet());
}

private Pair<String, String> toId(Table table, String setId) {
String tableUUID = tableUUID(table);
return Pair.of(tableUUID, setId);
}

private String tableUUID(Table table) {
TableOperations ops = ((HasTableOperations) table).operations();
return ops.current().uuid();
return Pair.of(Spark3Util.baseTableUUID(table), setId);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,8 @@
import java.util.Map;
import java.util.Set;
import java.util.stream.Collectors;
import org.apache.iceberg.HasTableOperations;
import org.apache.iceberg.ScanTask;
import org.apache.iceberg.Table;
import org.apache.iceberg.TableOperations;
import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
import org.apache.iceberg.relocated.com.google.common.collect.Maps;
import org.apache.iceberg.util.Pair;
Expand Down Expand Up @@ -64,17 +62,12 @@ public <T extends ScanTask> List<T> removeTasks(Table table, String setId) {

public Set<String> fetchSetIds(Table table) {
return tasksMap.keySet().stream()
.filter(e -> e.first().equals(tableUUID(table)))
.filter(e -> e.first().equals(Spark3Util.baseTableUUID(table)))
.map(Pair::second)
.collect(Collectors.toSet());
}

private String tableUUID(Table table) {
TableOperations ops = ((HasTableOperations) table).operations();
return ops.current().uuid();
}

private Pair<String, String> toId(Table table, String setId) {
return Pair.of(tableUUID(table), setId);
return Pair.of(Spark3Util.baseTableUUID(table), setId);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@
import java.util.stream.Collectors;
import java.util.stream.Stream;
import org.apache.hadoop.fs.Path;
import org.apache.iceberg.BaseMetadataTable;
import org.apache.iceberg.HasTableOperations;
import org.apache.iceberg.NullOrder;
import org.apache.iceberg.PartitionField;
import org.apache.iceberg.PartitionSpec;
Expand Down Expand Up @@ -948,6 +950,17 @@ public static org.apache.spark.sql.catalyst.TableIdentifier toV1TableIdentifier(
return org.apache.spark.sql.catalyst.TableIdentifier.apply(table, database);
}

static String baseTableUUID(org.apache.iceberg.Table table) {
if (table instanceof HasTableOperations) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just call table.uuid() in all of those places?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because BaseMetadataTable gives new UUID instead of base table's UUID.
Shall I fix that method to return base table's UUID ?

public UUID uuid() {
return UUID.randomUUID();
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think while adding UUID interface we concluded that we should not use base table's UUID
#8800 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the argument there is that the metadata table can be considered as a separate table and should therefore have it's own unique identifier compared to the base table.

But I think @nastra point still stands, even if it's different then the base table UUID, why does that matter here? I think we just want the table.uuid() right? or do we need the metadata table's underlying table's UUID?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried using table.uuid(), many testcase failed as the scan task of metadata table expects UUID of the base table not the metadata table.

java.lang.IllegalArgumentException: No scan tasks found for 2c44000a-aa24-479a-8666-292cee70b95f

Does it make sense to return base table's UUID for the metadata table? (That is change the behaviour from #8800?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. Rebased.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, looks like SparkStagedScan is expecting base table's uuid for cache for metadata tables.

Either we need to change that logic or return base table uuid. I will dig deeper next week.

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar Dec 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'd check out that logic further and we can see what the right behavior is here. I still think the change that was made in #9310 is definitely the right fix from an API perspective (even if we decide not to use that API here). The main issue that was solved there was semantically the metadata table UUID should be the same for the same reference.

In other words, imo I would not change the UUID API semantics to fit whatever the caching logic relies on.

If we need the base table UUID for the caching logic, then maybe MetadataTable specifically can expose another API for exposing the underlying base table's UUID. Or alternatively keep it as is, and just expose the underlying Table (but that seems to expose too much imo).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is a tight correlation between metadata table scan tasks and main table UUID from multiple classes.
If we need to change it, it can be handled in a separate PR (Issue) as it is nothing to do with this deprecated method removal.

Hence, I went back to reverting using table.uuid()

So, this PR can go ahead.
cc: @nastra, @amogh-jahagirdar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nastra: Thoughts?

TableOperations ops = ((HasTableOperations) table).operations();
return ops.current().uuid();
} else if (table instanceof BaseMetadataTable) {
return ((BaseMetadataTable) table).table().operations().current().uuid();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call table on table looks strange. It would be better to have a method baseTable().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function can be called for main table or metadata table. So, the varibale name is table.

Agree that BaseMetadataTable can have a public interface as baseTable(). But current interface table() is a public interface, we can't rename directly. It has to be deprecated first and new interface.

So, I think we can leave it as it is as of now (out of scope for this PR).

} else {
throw new UnsupportedOperationException("Cannot retrieve UUID for table " + table.name());
}
}

private static class DescribeSortOrderVisitor implements SortOrderVisitor<String> {
private static final DescribeSortOrderVisitor INSTANCE = new DescribeSortOrderVisitor();

Expand Down