[FLUSS-2262][lake] Improved Stability for Iceberg Log Table Compaction Test #2265

rionmonster · 2025-12-27T16:58:58Z

Purpose

Linked issue: close #2262

Per Issue #2262, this pull request addresses a race condition that could sometimes occur and result in the IcebergRewriteITCase.testLogTableCompaction test case failing (particularly during CI builds).

Brief change log

This change updates the LakeTableTieringManager to address potential race conditions surrounding state transitions. In some cases, delayed callbacks or retries could act on tables before their new state was recorded, or stale entries in the pending queue could be handed out even though the table was no longer eligible for tiering. Some additional changes were made to improve table requesting/removal use-cases as well.

Tests

These changes were initially verified after successfully reproducing the issue (via iterating over the IcebergRewriteITCase.testLogTableCompaction test). After the root cause was identified, new tests were introduced to help verify these issues:

LakeTableTieringManagerTest.testRequestTableSkipsStaleQueueEntries - Verifies the changes made to LakeTableTieringManager.requestTable to ensure that invalid or stale state transitions would not negatively affect the tiering process.
LakeTableTieringManagerTest.testScheduledStateRecordedBeforeTimerCallbackRuns - Verifies any potential race-conditions related to immediate state changes to ensure no side-effects are created.
LakeTableTieringManagerTest.testRemoveLakeTableRemovesAllPendingQueueOccurrences - Verifies that any potential duplicate entries within the internal pendingTieringTables queue are removed during table removal (via removeLakeTable().

API and Format

N/A

Documentation

N/A

Reviewer(s) Requested

@swuferhong (as original reporter), @beryllw (as original author)

…n Test

luoyuxia · 2025-12-29T01:50:47Z

...ke-iceberg/src/test/java/org/apache/fluss/lake/iceberg/maintenance/IcebergRewriteITCase.java

            checkFileStatusInIcebergTable(t1, 3, false);

+            // Ensure tiering job has fully processed the previous writes
+            assertReplicaStatus(t1Bucket, i);


I'm curious about it. If the files count already is 3 in iceberg, shouldn't it also mean that the tiering has already tiered all records since we only write 3 records.

@luoyuxia

I agree that it’s a bit odd. The additional replica assertion seems to alleviate any race conditions within this specific test case that we were seeing arise within the failing CI instances (at least based on my local testing to reproduce the issue).

It may simply be a testing artifact as opposed to a legitimate issue, but if it doesn’t seem like a fix, we can explore a few other avenues. It seemed like one of those common race conditions, so leveraging an existing function to help seemed like a decent approach. Happy to explore some additional avenues though, if we feel that we need a bit more exhaustive checks.

@rionmonster I tried to reporduce it in my local. But fails. I think it'll be better to explore the root cause for I'm afraid of it is caused by another critical issue.

@luoyuxia

Good to know! It seemed to be pretty consistent regarding pass/fail on my end, but clearly there’s something else at play.

I’ll do some more exploration and see what I find. Thanks for the feedback!

@luoyuxia

I've been digging into this a bit further and it seems there's some disparity between the actual files being written and retrieving the latest offsets for those (specifically after any types of asynchronous operations, such as compaction, etc.). I think we may need some mechanism to improve the consistency, at least within the bounds of the tests.

I wrote a little monitor to run through several iterations of the tests to see what the state of various bits looked like during each iteration. Here's a sample of those:

Successful Test (Successfully Performs Compaction as Expected)
[MONITOR] log_table_33 - TableBucket{tableId=32, bucket=0}

IcebergFiles IcebergSnapshotId LakeSnapshotId ReplicaLakeSnapId ReplicaLakeLogEndOff Timestamp

0 -1 -1 -1 -1 1767113560800

0 -1 -1 -1 -1 1767113561046

... ... ... ... ... ...

3 5182733673261799288 5182733673261799288 5443797100773076340 3 1767113615059

3 5182733673261799288 5182733673261799288 5443797100773076340 3 1767113615307

2 2575057976625237982 2575057976625237982 5443797100773076340 4 1767113615557

2 2575057976625237982 2575057976625237982 5443797100773076340 4 1767113615808

Failing Test (File appeared to never be properly written before expected offset)
[MONITOR] log_table_34 - TableBucket{tableId=33, bucket=0}

IcebergFiles IcebergSnapshotId LakeSnapshotId ReplicaLakeSnapId ReplicaLakeLogEndOff Timestamp

0 -1 -1 -1 -1 1767113616327

... ... ... ... ... ...

2 7273969972093574431 7273969972093574431 7273969972093574431 2 1767113861627

2 7273969972093574431 7273969972093574431 7273969972093574431 2 1767113861882

2 7273969972093574431 7273969972093574431 7273969972093574431 2 1767113862135

2 7273969972093574431 7273969972093574431 7273969972093574431 2 1767113862381

2 7273969972093574431 7273969972093574431 7273969972093574431 2 1767113862633

[ASSERTION FAILURE] Expected offset 3 but got 2 for bucket TableBucket{tableId=33, bucket=0} Replica Lake Snapshot ID: 7273969972093574431 Current State: Iceberg Files: 2 Iceberg Snapshot ID: 7273969972093574431 Lake Snapshot ID (from admin): 7273969972093574431 Replica Lake Snapshot ID: 7273969972093574431 Replica Lake Log End Offset: 2

I'm not sure if this is more of an artifact of the tests themselves or a legitimate issue. Any thoughts? Happy to continue digging. It feels like a race-condition due to the inconsistency, either on writing the files or reading stale offsets from the data lake directly. I'm sure in a real-world environment, this might be tolerable (as I suspect it's just a minor latency spike which would eventually resolve), but in the confines of a test, it's flaky.

it was possible to request a table from the queue that was no longer actually in the "Pending" state.

@rionmonster
Could you please explain a bit about in which case the table in the queue that was no longer actually in the "Pending" state.? Although it's no in "Pending" state, the tiering service still can requst the table and do the tiering. Also, from the log, I can't see the log like "Fail to change state for table ".

To me, it looks like that the pendingTieringTables is empty, but I still can't figure out why

Also, could you please share me your reproduce branch so that I can reproduce it in my local env.

@luoyuxia

Sure! Cleaning up some of my notes from the troubleshooting session, but hopefully this workflow makes sense:

A table T becomes eligible and is assigned normally (Pending to Tiering).

Later, a stale duplicate entry for T ends up in the pending queue (e.g., due to retries, late timer callbacks, or test churn).

Before the fix, requestTable() would poll T from the pending queue and attempt to assign it again, even though its actual state was no longer Pending.

That resulted in an invalid state transition attempt (non-Pending/Tiering to Tiering), but the table was still returned to the tiering service, causing things to become inconsistent between the pendingTieringTables and tieringStates.

The key difference with this change is that after it requestTable() actually verifies the table is in the Pending state before assigning it instead of just relying on its presence in the pending queue.

@rionmonster

Later, a stale duplicate entry for T ends up in the pending queue (e.g., due to retries, late timer callbacks, or test churn).

As for the fail test, which case cause a stale duplicate T entering into pending queue? Do you have ever find any logs?

would poll T from the pending queue and attempt to assign it again, even though its actual state was no longer Pending

If it's a invalid state changte, we'll print

Fail to change state for table xxx

But I can't see such log in the fail ci

but the table was still returned to the tiering service, causing things to become inconsistent between the pendingTieringTables and tieringStates.

To me, it looks me like that the table won't be returned to the tiering service anymore from the log.

So, i'm a little doubt about this case.

Could you please share me your reproduce branch, I want reproduce it in my local env to find the root cause.

@luoyuxia

Based on our private conversations exploring this, I'm including the following based on

Okay — I was able to reproduce the failure successfully with some extended debugging logs added throughout the LakeTableTieringManager. I've created a gist here with the logs. Reviewing over them, it looks like we have the following chain of events:

At 411079 we have a series of these "polled tableId=x but tablePath is null (state=null, epoch=null)" type of calls, which repeats 10+ times

After this burst we can see that the test fails (since rewrite/compaction didn't complete in time)

I think we can interpret this as:

Tiering service requested work (via requestTable())

Manager pulled table ids from pendingTieringTables but one of the following was true:

tablePaths didn't contain the id

tieringStates didn't contain the id

tableTierEpoch didn't contain the id

This means that the table ids were no longer registered (e.g., previous dropped/removed) but were still present in the pendingTieringTables

This causes any requestTable() calls to continually drain/recurse/loop over these stale entries which could delay processing of actual pending tables

This delay could be enough to cause the failed assertion

I think the proposed fixes would help with this, although we may need to consider adjusting the removeLakeTable() call as well to ensure we remove the requested table from the pending tables as well, which I'll add to the PR:

public void removeLakeTable(long tableId) { inLock(lock, () -> { // Omitted for brevity pendingTieringTables.remove(tableId); }); }

I've created a branch with these additional logs if you would like to explore it yourself at https://github.com/rionmonster/fluss/tree/for-yuxia-with-logs

luoyuxia

@rionmonster Thanks for the pr. Only one question

…g state handling [FLUSS-2262][lake][server] Address lake table tiering stability during state handling

…during table removal

[FLUSS-2262][lake] Improved Stability for Iceberg Log Table Compactio…

9a6b24b

…n Test

luoyuxia reviewed Dec 29, 2025

View reviewed changes

rionmonster added 2 commits January 9, 2026 19:52

[FLUSS-2262][lake][server] Address lake table tiering stability durin…

8159dee

…g state handling [FLUSS-2262][lake][server] Address lake table tiering stability during state handling

[FLUSS-2262][lake][server] Ensure pending tiering tables are removed …

d901546

…during table removal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FLUSS-2262][lake] Improved Stability for Iceberg Log Table Compaction Test #2265

[FLUSS-2262][lake] Improved Stability for Iceberg Log Table Compaction Test #2265

Uh oh!

rionmonster commented Dec 27, 2025 •

edited

Loading

Uh oh!

luoyuxia Dec 29, 2025

Uh oh!

rionmonster Dec 29, 2025 •

edited

Loading

Uh oh!

luoyuxia Dec 29, 2025

Uh oh!

rionmonster Dec 29, 2025

Uh oh!

rionmonster Dec 30, 2025 •

edited

Loading

Uh oh!

luoyuxia Jan 10, 2026 •

edited

Loading

Uh oh!

luoyuxia Jan 10, 2026

Uh oh!

rionmonster Jan 10, 2026

Uh oh!

luoyuxia Jan 10, 2026

Uh oh!

rionmonster Jan 10, 2026 •

edited

Loading

Uh oh!

luoyuxia left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

IcebergFiles	IcebergSnapshotId	LakeSnapshotId	ReplicaLakeSnapId	ReplicaLakeLogEndOff	Timestamp
0	-1	-1	-1	-1	1767113560800
0	-1	-1	-1	-1	1767113561046
...	...	...	...	...	...
3	5182733673261799288	5182733673261799288	5443797100773076340	3	1767113615059
3	5182733673261799288	5182733673261799288	5443797100773076340	3	1767113615307
2	2575057976625237982	2575057976625237982	5443797100773076340	4	1767113615557
2	2575057976625237982	2575057976625237982	5443797100773076340	4	1767113615808

IcebergFiles	IcebergSnapshotId	LakeSnapshotId	ReplicaLakeSnapId	ReplicaLakeLogEndOff	Timestamp
0	-1	-1	-1	-1	1767113616327
...	...	...	...	...	...
2	7273969972093574431	7273969972093574431	7273969972093574431	2	1767113861627
2	7273969972093574431	7273969972093574431	7273969972093574431	2	1767113861882
2	7273969972093574431	7273969972093574431	7273969972093574431	2	1767113862135
2	7273969972093574431	7273969972093574431	7273969972093574431	2	1767113862381
2	7273969972093574431	7273969972093574431	7273969972093574431	2	1767113862633

[FLUSS-2262][lake] Improved Stability for Iceberg Log Table Compaction Test #2265

Are you sure you want to change the base?

[FLUSS-2262][lake] Improved Stability for Iceberg Log Table Compaction Test #2265

Uh oh!

Conversation

rionmonster commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Brief change log

Tests

API and Format

Documentation

Reviewer(s) Requested

Uh oh!

luoyuxia Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

rionmonster Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luoyuxia Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

rionmonster Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

rionmonster Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luoyuxia Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luoyuxia Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

rionmonster Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

luoyuxia Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

rionmonster Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luoyuxia left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rionmonster commented Dec 27, 2025 •

edited

Loading

rionmonster Dec 29, 2025 •

edited

Loading

rionmonster Dec 30, 2025 •

edited

Loading

luoyuxia Jan 10, 2026 •

edited

Loading

rionmonster Jan 10, 2026 •

edited

Loading