-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[prism] Support OnWindowExpiry #32211
Comments
The initial re-write attempt required so many adjustments to how processing was happening. (~1000 lines of changes). I wonder if there isn't a simpler way? What if we simply add a "GC timer" field to the timer element, and always sort a GC timer after non-GC timers, with the timer adding an output hold for end of window. This should be fine until we add AllowedLateness. It seems intuitive this isn't affected by AllowedLateness as that would only affect the inputs arriving from upstream and not within the window, which always closes the window at the end of the window from the transform's perspective. The trick remains that we want these timers to only fire at the end of the window, right before the window state is GC'd. We don't want to leave a "gap" where we have prematurely fired the OnWindowExpiry timer, and then late data comes in setting new timers, and state. |
New idea: Add a side channel into the bundle generation loop, and have the garbage collection mechanism push the expiry bundles timers directly into the loop. State is move to be owned by that bundle, and there's a marker saying that this bundle is an expiry bundle, so the state doesn't need to be returned to the main store. Avoids a gap. Does have special case plumbing. Keeps the state clean, and avoids a 2nd check through the GC handler. |
The callback for on window expiry is a special timer based callback that the runner triggers prior to garbage collection of a window's state and side inputs. This allows user code to clean up use or clean up state instead of losing it.
The feature is notably used in the GroupIntoBatches transforms for the Java and Python SDKs.
Garbage collection currently happens in a single phase, when watermarks are being updated.
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/engine/elementmanager.go#L1560
We need 1: to indicate that a WindowExpiry timer needs to be sent for the stage. That'll probably be similar to the existing stateful and aggregate bools on stageState, and setter methods for the
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/engine/elementmanager.go#L1052 Though we'll need the SDK given "name" for the callback, so it'll be a string field in stageState.
In principle this means that should be in startEventTimeBundle, after indicating the state should be garbage Collected. But we need to indicate windows which are ready for
So sort of like a mark and sweep algorithm for general garbage collection.
The current place we garbage collect is probably fine for both. We must "mark" the windows that have expired if they have OnWindowExpiry, and then afterwards do the current garbage collection "sweep" with the set of collectable windows. But then we'd need the sweeps to be aware of both the window and the key which is a bit heavier on memory.
Ideally we want to be able to clean up in PersistBundle, where the cleanup is just done by not storing the state back for the window of the stage in question.
But then how do we communicate that intent back and avoid too much lock contention?
What I've got so far:
That's not too bad.
This ties nicely to the cleanup of state to be directly part of the bundle instead of being requested after the fact. That avoids certain read/write contentions too.
The text was updated successfully, but these errors were encountered: