Skip to content

Conversation

@sririshindra
Copy link
Contributor

While testing rollback_to_snaphot feature at Cloudera, we found that this is feature is failing in some cases.

In the setSetCurrentSnapshot method 'null' is being passed for timestamp which eventually causes the failure of the following precondition check. The timeStamp should also be passed along with the Snapshot just like here

This PR fixes this bug. I tested this with our internal tests and also by running the unit tests in Iceberg.

Steps to reproduce the rollback_to_snapshot bug in spark.

Each insert operation below is done separately in its own spark app.

sql("INSERT INTO %s VALUES (3, 'zzz', 1)" % table1)    
sql("INSERT INTO %s VALUES (6, 'www', 1)" % table1)
sql("INSERT INTO %s VALUES (9, 'zyz', 1)" % table1)
sql("INSERT INTO %s VALUES (2, 'yy', 2)" % table1)
sql("INSERT INTO %s VALUES (4, 'xx', 2)" % table1)
sql("INSERT INTO %s VALUES (1, 'x', 3)" % table1)`

launch a new spark shell and follow the following steps

results, _ = sql("select id, data, id2 from %s order by id2, id" % table1) # some internal code to get the results.
results, _ = sql("select snapshot_id from %s.%s.snapshots" % (DATABASE, table1))
snapshots = self.get_rows(results) # some internal code to get snapshots in a python list.
sql("CALL spark_catalog.system.rollback_to_snapshot('%s', %s)" % (table1, str(snapshots[2][0])))`

@github-actions github-actions bot added the core label Feb 11, 2022
@wypoon
Copy link
Contributor

wypoon commented Feb 11, 2022

@rdblue this will be superseded by #4089, but this seems to be a regression which should be fixed in 0.13.1?

@rdblue
Copy link
Contributor

rdblue commented Feb 12, 2022

@wypoon are you sure that's the cause? If the timestamp is set to null then the timestamp used for the log is the timestamp from the snapshot itself. That corresponds to the original behavior and is correct. When the current snapshot is set to a newly created snapshot, we set the log timestamp to the snapshot timestamp.

I think the problem is that the call to roll back is using setCurrentSnapshot(Snapshot) rather than setCurrentSnapshot(long). The latter assumes that the snapshot is existing and sets the timestamp to System.currentTimeMillis(). I think the right fix here is to make sure that SnapshotManager ends up calling the right variant of setCurrentSnapshot.

@wypoon
Copy link
Contributor

wypoon commented Feb 12, 2022

@rdblue I have not investigated this issue and do not know the cause. Thanks for your explanation.
@sririshindra reported the issue. It just sounds to me like a regression that we should try to fix in 0.13.1. (But it could also be fixed in a 0.13.2.)

@rdblue
Copy link
Contributor

rdblue commented Feb 14, 2022

@wypoon, please have a look at #4089, I think that fixes this problem in a better way.

@wypoon
Copy link
Contributor

wypoon commented Feb 15, 2022

@rdblue we're testing Iceberg 0.13.0. You're right that the rollback is using setCurrentSnapshot(Snapshot) rather than setCurrentSnapshot(long):

Exception in thread "main" java.lang.IllegalArgumentException: [BUG] Expected sorted snapshot log entries.
	at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:142)
	at org.apache.iceberg.TableMetadata.<init>(TableMetadata.java:306)
	at org.apache.iceberg.TableMetadata$Builder.build(TableMetadata.java:1040)
	at org.apache.iceberg.TableMetadata.replaceCurrentSnapshot(TableMetadata.java:501)
	at org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:289)
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404)
	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190)
	at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:282)
	at org.apache.iceberg.SnapshotManager.commit(SnapshotManager.java:32)
	at org.apache.iceberg.spark.procedures.RollbackToSnapshotProcedure.lambda$call$0(RollbackToSnapshotProcedure.java:85)

I know that you're working on #4089, but that depends on the new work adding SnapshotRef and so on for the branching/tagging feature. Would a more limited fix be appropriate for a 0.13.x bug fix release? I still think this bug should be fixed in 0.13.x.

@rdblue
Copy link
Contributor

rdblue commented Feb 15, 2022

@wypoon, I'd support making similar changes to what's in #4089 here.

@rdblue
Copy link
Contributor

rdblue commented Feb 15, 2022

I just merged #4089, so I think this is fixed. If you want, feel free to port the fix over to the 0.13.x branch and I'll review it.

@sririshindra
Copy link
Contributor Author

I just merged #4089, so I think this is fixed. If you want, feel free to port the fix over to the 0.13.x branch and I'll review it.

Thanks @rdblue, I am testing the fix based on your #4089 . I will publish a PR gaianist the 0.13.x branch as soon as the tests pass.

@rdblue
Copy link
Contributor

rdblue commented Feb 15, 2022

Thanks, @sririshindra!

@sririshindra
Copy link
Contributor Author

@rdblue I opened #4135 to port the fix on top of the 0.13.x branch. As such I am closing this PR. Thanks for your reviews in this PR. Please review #4135 at your convenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants