Skip to content

Reading as of Snapshot ID fails on Metadata Tables after Iceberg Table Schema Update #6978

@sungwy

Description

@sungwy

Apache Iceberg version

1.1.0 (latest release)

Query engine

Spark

Please describe the bug 🐞

Time travel / reading as of certain snapshot ID fails on Metadata Tables if there was ever a schema evolution introduced in the iceberg table. This seems like it could be an unwanted side effect of this PR that allows us to use the snapshot schema when reading a snapshot: #3722

Since schema evolution is not supported on metadata tables, we could patch this bug by using a condition that checks if the iceberg table is an instance of BaseMetadataTable before making the snapshotSchema call

Example query:

spark.read.format("iceberg").option("snapshot-id", 10963874102873L).load("db.table.files")

Example Error after Schema evolution:

Py4JJavaError: An error occurred while calling o373.load.
: java.lang.IllegalStateException: Cannot find schema with schema id 1
	at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkState(Preconditions.java:590)
	at org.apache.iceberg.util.SnapshotUtil.schemaFor(SnapshotUtil.java:363)
	at org.apache.iceberg.util.SnapshotUtil.schemaFor(SnapshotUtil.java:388)
	at org.apache.iceberg.spark.source.SparkTable.snapshotSchema(SparkTable.java:127)
	at org.apache.iceberg.spark.source.SparkTable.schema(SparkTable.java:133)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation$.create(DataSourceV2Relation.scala:176)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:303)
	at scala.Option.map(Option.scala:230)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:265)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:239)
	at jdk.internal.reflect.GeneratedMethodAccessor210.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.base/java.lang.Thread.run(Thread.java:829)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions