Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk load CDK: Even more tests #47377

Merged
merged 3 commits into from
Oct 28, 2024
Merged

Bulk load CDK: Even more tests #47377

merged 3 commits into from
Oct 28, 2024

Conversation

edgao
Copy link
Contributor

@edgao edgao commented Oct 25, 2024

(based on https://github.com/airbytehq/airbyte/pull/47359/files, which has a few bugfixes in how we convert JsonNode to AirbyteValue)

as with other test porting PRs, probably best to just review by commit?

  • add newline characters to field values in test (a la DAT.testLineBreakCharacters) - also renamed the test to reflect this
  • delete some unnecessary micronaut annotations (the test framework no longer relies on micronaut at all)
  • add a dedup test (DAT.testIncrementalDedupeSync / BaseTDTest.incrementalDedup - I only took the records from BaseTDTest, because the DAT version was very basic+naive)
    • fixed a bug in the RecordDiffer to properly compare temporal types
    • updates the mock destination to support deduping. This is, somehow, probably the cleanest depiction of upserting that we have anywhere >.>

Copy link

vercel bot commented Oct 25, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Oct 25, 2024 9:50pm

cursor: List<String>,
vararg records: OutputRecord
) {
fun getField(path: List<String>, record: OutputRecord): AirbyteValue? {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most (all?) destinations don't actually support traversing a jsonpath to fetch a cursor/PK, but it's easy enough to implement in java 🤷

@edgao edgao marked this pull request as ready for review October 25, 2024 18:35
@edgao edgao requested a review from a team as a code owner October 25, 2024 18:35
@edgao edgao requested a review from johnny-schmidt October 25, 2024 18:36
@edgao
Copy link
Contributor Author

edgao commented Oct 25, 2024

@benmoriceau if you want to start dipping into some of this test stuff, this PR has a few interesting pieces:

  1. there's an entire test case being ported into the new framework. (my personal philosophy: I prefer long/verbose tests over tests that are short, but require a lot of context/digging to understand what they're doing. So if it takes you a lot of effort to understand these tests, please tell me 🙏 )
  2. The new tests run on docker in CI, but can be run without docker locally (you still have the option of running against docker, it's just not typically useful). Take a look at DestinationProcess.kt + the gradle stuff to see how it works
  3. Identically to real connectors, the CDK itself runs the integration tests on a mock in-memory destination. This (a) makes it easier to write new tests (because testing the tests against real destinations is slower and less convenient), and (b) is an easy way to verify CDK behavior in isolation
  4. This PR adds a test for deduping, so it also adds dedup support to that mock destination.
  5. There's also a bugfix in RecordDiffer, which is the core of our integration tests. It generates a pretty-printed diff between a list of expected records, and the actual records in the destination. (take a look at DestinationDataDumper / IntegrationTest.dumpAndDiffRecords if you want to see how it's used)
  6. (there's also a previous test case being updated to cover more stuff, in a way that's kind of gross - IMO this test case isn't super valuable, but it's not obviously useless enough to just delete it)

@edgao edgao force-pushed the edgao/even_even_more_tests branch from be29e9e to f4a21ac Compare October 25, 2024 21:42
Base automatically changed from edgao/support_missing_field to master October 25, 2024 21:49
@edgao edgao force-pushed the edgao/even_even_more_tests branch from f4a21ac to a77a153 Compare October 25, 2024 21:50
@benmoriceau
Copy link
Contributor

@benmoriceau if you want to start dipping into some of this test stuff, this PR has a few interesting pieces:

  1. there's an entire test case being ported into the new framework. (my personal philosophy: I prefer long/verbose tests over tests that are short, but require a lot of context/digging to understand what they're doing. So if it takes you a lot of effort to understand these tests, please tell me 🙏 )
  2. The new tests run on docker in CI, but can be run without docker locally (you still have the option of running against docker, it's just not typically useful). Take a look at DestinationProcess.kt + the gradle stuff to see how it works
  3. Identically to real connectors, the CDK itself runs the integration tests on a mock in-memory destination. This (a) makes it easier to write new tests (because testing the tests against real destinations is slower and less convenient), and (b) is an easy way to verify CDK behavior in isolation
  4. This PR adds a test for deduping, so it also adds dedup support to that mock destination.
  5. There's also a bugfix in RecordDiffer, which is the core of our integration tests. It generates a pretty-printed diff between a list of expected records, and the actual records in the destination. (take a look at DestinationDataDumper / IntegrationTest.dumpAndDiffRecords if you want to see how it's used)
  6. (there's also a previous test case being updated to cover more stuff, in a way that's kind of gross - IMO this test case isn't super valuable, but it's not obviously useless enough to just delete it)

Thanks, I'll look at it when I'm done with the file transfer

// factory into our tests, not a pre-instantiated destination, because we want
// to run multiple destination processes per test.
@Singleton
@Requires(notEnv = [DOCKERIZED_TEST_ENV])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because docker-v-nondocker is now controlled by an explicit env variable check in the process factory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, exactly. I just forgot to remove these annotations in #47006

@edgao edgao merged commit 9dcb7e7 into master Oct 28, 2024
41 checks passed
@edgao edgao deleted the edgao/even_even_more_tests branch October 28, 2024 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants