-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: load CDK clears partial aggregates at cadence #48551
Conversation
…ariants don't wrap records. Additional cleanup.
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
* | ||
* In a future where we deserialize only the info necessary for routing, this could include a dumb | ||
* container for the serialized, and deserialization could be deferred until the spooled records | ||
* were recovered from disk. | ||
*/ | ||
sealed class DestinationRecordWrapped : Sized | ||
sealed class DestinationStreamEvent : Sized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 s/wrapped/event
@@ -163,6 +165,10 @@ class DefaultDestinationTaskLauncher( | |||
} | |||
} | |||
|
|||
// Start flush task | |||
log.info { "Starting flush tick task" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit can we change these log lines to call out the different types of tasks
Starting flush timer for state staged files
or something
Starting flush timer for stale checkpoints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe include $task
since that will appear in the logs when the task is launched/completed/canceled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me see what I can do
cfa272d
to
1a5cf93
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this works as advertised but I'm not sure that it solves the problem. Let's talk through my comment before going forward
airbyte-cdk/bulk/core/load/src/main/kotlin/io/airbyte/cdk/load/task/internal/SpillToDiskTask.kt
Show resolved
Hide resolved
1a5cf93
to
c8ee5c7
Compare
What
Stream aggregates (spilled files) will be closed and flushed at cadence of 15 minutes, so 15-30 minutes to prevent backing up indefinitely on a large number of partial aggregates (chunks of work below the size threshold for high cardinality syncs.
How
FlushTickTask
which publishes a flush event on each streams input channel at a cadence — default is 15 minutes but this can be tunedTimeWindowTrigger
which will respond to flush events and trigger a window publish if the aggregate has been open at least as long as the configured window width — default 15 minutes but this can be tunedDestinationRecordWrapped
->DestinationStreamEvent
since 2/3 of it's variants do not wrap a record.Not in this PR
Un-wiring the checkpoint time based flush task.
It does not currently work today and may not be relevant with this broader time based mechanism in place.
Reading Order
FlushTickTask
SpillToDiskTask
TimeWindowTrigger