-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support State TTL #148
Comments
Why is this needed?
How can we know a key belongs an unused state?
If we know a table line will expire on a date, how can we notify the related executors to delete this line in their internal states? |
Haven't thought too deep on this topic but my gut feeling is 1) is a special case of 2). We will need careful design on how to support this without breaking consistency. For non-overlapping window, we can just inject a barrier to reset and delete operator states when the window ends. For overlapping window, we may need to introduce some timestamp and store it with the states, which may require TTL support in Hummock. |
In our current design, the time window is determined by some time column and the time column is nothing special but just a regular column, and nobody can guarantee the time column is stored with real-world time. For example, table Generally, a table must not forget its data, otherwise, the data consistency is broken. The only correct way to 'clean' a row is to delete it from the source. As a result, for now, I would suggest considering time windows and TTL as 2 different, unrelated things, even though they have some connections in some conditions. |
State TTL's primary ability is to reduce storage size of long running streaming jobs, with a tradeoff of:
twocode@twocode-pc:~/others/flink$ grep -rsni "getIdleStateRetention" . | grep "plan/node" | awk -F '[/:]' '{print $12"/"$13"/"$14"/"$15"/"$16}'
plan/nodes/exec/stream/StreamExecGlobalGroupAggregate.java
plan/nodes/exec/stream/StreamExecChangelogNormalize.java
plan/nodes/exec/stream/StreamExecIncrementalGroupAggregate.java
plan/nodes/exec/stream/StreamExecGroupTableAggregate.java
plan/nodes/exec/stream/StreamExecGroupAggregate.java
plan/nodes/exec/stream/StreamExecGroupAggregate.java
plan/nodes/exec/stream/StreamExecRank.java
plan/nodes/exec/common/CommonExecSink.java
Implementation-wise, there are at least three options:
Flink used to use timer based ttl for table states and hit many issues and migrate to storage TTLs instead. Apparently, # 1 is preferred. |
Do we also want to support TTL for non-windowed states? Consider a non-windowed
IIUC, this also requires a timestamp column? The difference between 1) and 3) is 1) lets storage to TTL states on its own without cooperating with streaming. |
Just to remind that we don't have a "windowed state" now. The reasons include:
|
duplicate with #3298 |
State TTL is needed:
Things to consider:
The text was updated successfully, but these errors were encountered: