Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk Load CDK: State -> Checkpoint & flush at end #45377

Merged
merged 1 commit into from
Sep 11, 2024

Conversation

johnny-schmidt
Copy link
Contributor

What

  • Renaming DestinationStateMessage / GlobalState / StreamState etc to CheckpointMessage / GlobalCheckpoint / StreamCheckpoint / etc
  • State is now flushed at the end of the job

@johnny-schmidt johnny-schmidt requested a review from a team as a code owner September 10, 2024 22:59
Copy link

vercel bot commented Sep 10, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Sep 10, 2024 10:59pm

@octavia-squidington-iii octavia-squidington-iii added the CDK Connector Development Kit label Sep 10, 2024
@@ -74,6 +78,7 @@ class DestinationTaskLauncher(

suspend fun startTeardownTask() {
log.info { "Starting teardown task" }
checkpointManager.flushReadyCheckpointMessages()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgao here is the actual functional change :)

Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm (I only skimmed most of the diff, the actual functional change looks right)

doublechecking: we were already handling all of the records, it's only the state messages that need this explicit handler?

@johnny-schmidt
Copy link
Contributor Author

lgtm (I only skimmed most of the diff, the actual functional change looks right)

doublechecking: we were already handling all of the records, it's only the state messages that need this explicit handler?

@edgao Yes we were already handling all records, and we were flushing the ready state messages each time we received a new state message. What we weren't doing was flushing what was left at the end.

I think in the future I'll add some specific flush points that run async: like a flushCheckpoints task that gets enqueued every time new batches are persisted, etc

@johnny-schmidt johnny-schmidt merged commit 7056428 into master Sep 11, 2024
32 checks passed
@johnny-schmidt johnny-schmidt deleted the issue-9365/state-message-flush branch September 11, 2024 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants