Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk Load CDK: Staging refactor + tests; multi-sync ITs correctly wait for ack #48608

Merged
merged 19 commits into from
Nov 25, 2024

Conversation

johnny-schmidt
Copy link
Contributor

@johnny-schmidt johnny-schmidt commented Nov 22, 2024

What

First three commits are modifications to state so that it is answering specific questions rather than being inspected
Also tests around ensuring those questions are answered

Next two move staging from processBatch to close and disable the tests that are incorrectly not awaiting a checkpoint ack

Next 6 are @edgao 's test fixes to make the multi-sync tests wait on ack

Last one tweaks that to run more efficiently and not use filler records; it also ensures that destination state is persisted after each file is written to staging, so that orphaned staged data after a failure is recovered

EDIT
Plus

  • one commit I accidentally stranded that makes staging fall back to metadata when the file isn't present
  • a couple of attempts to keep mock dest it from breaking

@johnny-schmidt johnny-schmidt requested a review from a team as a code owner November 22, 2024 00:40
Copy link

vercel bot commented Nov 22, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Nov 25, 2024 8:36pm

@octavia-squidington-iii octavia-squidington-iii added area/connectors Connector related issues CDK Connector Development Kit labels Nov 22, 2024
@johnny-schmidt johnny-schmidt requested a review from edgao November 22, 2024 00:40
@johnny-schmidt johnny-schmidt force-pushed the jschmidt/s3v2/issue-10732/staging-tests branch 4 times, most recently from 010cab1 to 445ce4b Compare November 22, 2024 16:41
@johnny-schmidt johnny-schmidt requested a review from a team as a code owner November 22, 2024 19:54
@johnny-schmidt johnny-schmidt changed the title Jschmidt/s3v2/issue 10732/staging tests Bulk Load CDK: Staging refactor + tests; multi-sync ITs correctly wait for ack Nov 22, 2024
@johnny-schmidt johnny-schmidt force-pushed the jschmidt/s3v2/issue-10732/staging-tests branch from f574cd5 to 8cd20df Compare November 22, 2024 21:13
@johnny-schmidt
Copy link
Contributor Author

fixed the failing mock dest IT + rebased

Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests failing in timeout - that's probably expected, given that we're pushing huge data per test. Could try bumping the timeout in destination-s3-v2/gradle.properties, there should be examples in some other destination

(or reduce the number of messages, if we're flushing on every message)

rate-ms: 900000 # 15 minutes
window-ms: 900000 # 15 minutes
destination:
record-batch-size: 1 # 1 byte for testing; 1 record => 1 upload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, if we're doing this - do we still need the millions of records thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disabled that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah derp, I didn't read that file b/c I thought it was just my diff. this makes sense!

(... though I'm not sure why we're hitting test timeouts then)

@@ -79,7 +80,8 @@ class DefaultSyncManager(
stream: DestinationStream.Descriptor
): StreamLoader? {
val completable = streamLoaders[stream]
return completable?.let { if (it.isCompleted) it.await() else null }
// `.isCompleted` does not work as expected here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious: what was broken about it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did not return true sometimes even when the loader was clearly completed.

@johnny-schmidt
Copy link
Contributor Author

@edgao The docker tests were timing out because they needed to consume the batch size limitation via an env variable. I hacked it in. Also they don't throw unclean exit on kill, so I hacked in an exception.

Very hack all the way, but they should pass now.

@johnny-schmidt johnny-schmidt force-pushed the jschmidt/s3v2/issue-10732/staging-tests branch from 8ead702 to 8c05dc8 Compare November 23, 2024 02:07
@johnny-schmidt johnny-schmidt force-pushed the jschmidt/s3v2/issue-10732/staging-tests branch from b65a21f to 88205b7 Compare November 25, 2024 00:19
@johnny-schmidt johnny-schmidt enabled auto-merge (squash) November 25, 2024 20:39
@johnny-schmidt johnny-schmidt merged commit f4cfb4b into master Nov 25, 2024
33 checks passed
@johnny-schmidt johnny-schmidt deleted the jschmidt/s3v2/issue-10732/staging-tests branch November 25, 2024 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants