Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk Load CDK: Add integration test using in-memory mock destination #45634

Merged
merged 5 commits into from
Oct 3, 2024

Conversation

edgao
Copy link
Contributor

@edgao edgao commented Sep 17, 2024

Add "integration tests" for the CDK, which test full syncs/etc. against an in-memory destination. Very rudimentary for now. Add a new integrationTest task+classpath, to avoid accidentally breaking any micronaut stuff in unit tests. Also updates our github actions (I think?) to call that task as needed.

also - OutputRecord.airbyteMeta is now a proper struct instead of JsonNode; updated RecordDifferTest appropriately. (IntNode / LongNode equality is annoying, in that IntNode(42) != LongNode(42)).

closes https://github.com/airbytehq/airbyte-internal-issues/issues/9960

Copy link

vercel bot commented Sep 17, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Oct 1, 2024 9:58pm

Copy link
Contributor Author

edgao commented Sep 17, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @edgao and the rest of your teammates on Graphite Graphite

@Requires(env = ["test"])
//@Factory
//@Replaces(factory = DestinationCatalogFactory::class)
//@Requires(env = ["test"])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnny-schmidt I did this yesterday as a quick hack just to get the test to run (otherwise micronaut threw a duplicate bean error). I think the easiest solution is to add another env to the Requires clause? @Requires(env=["test", "mock_catalog_factory"]) and then add that new env to all the tests that rely on this bean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could also add some Requires thing to the MockDestinationWrite, but then I'd need to plumb it down to the CliRunner

(... I could also move this stuff to testIntegration and wire up a proper integration test task 🤷 no strong opinion. But I do think it would be nice for the MockCatalogFactory to be an explicit opt-in)

@@ -0,0 +1,4 @@
---
data:
dockerRepository: "airbyte/fake-source"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this file, it's needed to make something in the connector runner happy

@edgao edgao force-pushed the edgao/in_memory_test branch from 2e326f0 to f377e2a Compare September 19, 2024 21:49
@edgao edgao force-pushed the edgao/in_memory_test branch from f377e2a to 5fd6001 Compare September 19, 2024 21:54
@edgao edgao force-pushed the edgao/in_memory_test branch from 5fd6001 to 371f6c7 Compare September 19, 2024 21:56
@edgao edgao force-pushed the edgao/in_memory_test branch from 371f6c7 to f576bce Compare September 19, 2024 22:00
@edgao edgao force-pushed the edgao/in_memory_test branch from f576bce to 40e7de2 Compare September 20, 2024 15:24
@edgao edgao force-pushed the edgao/in_memory_test branch from 40e7de2 to 3b39969 Compare September 20, 2024 16:42
@edgao edgao force-pushed the edgao/in_memory_test branch 2 times, most recently from 5da57c2 to 9d49f7d Compare September 20, 2024 18:38
Base automatically changed from edgao/spec_test to issue-9361/load-cdk-with-e2e-dest-post-refactor September 20, 2024 20:02
@edgao edgao force-pushed the issue-9361/load-cdk-with-e2e-dest-post-refactor branch from 9511fda to 391756a Compare September 20, 2024 21:04
@edgao edgao force-pushed the edgao/in_memory_test branch from 9d49f7d to 7c5df8d Compare September 20, 2024 21:04
@johnny-schmidt johnny-schmidt force-pushed the issue-9361/load-cdk-with-e2e-dest-post-refactor branch 2 times, most recently from cfb0cc1 to 91e6e18 Compare September 26, 2024 17:56
@edgao edgao force-pushed the edgao/in_memory_test branch from e3a99c2 to 9e8312a Compare September 26, 2024 18:38
@johnny-schmidt johnny-schmidt force-pushed the issue-9361/load-cdk-with-e2e-dest-post-refactor branch 3 times, most recently from 3c6895c to 096dc33 Compare September 26, 2024 20:57
Base automatically changed from issue-9361/load-cdk-with-e2e-dest-post-refactor to master September 26, 2024 21:24
@edgao edgao force-pushed the edgao/in_memory_test branch from 9e8312a to 9797a8a Compare September 26, 2024 22:01

sourceSets {
integrationTest {
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these require paths?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it defaults?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the point to support this: Add a new integrationTest task+classpath, to avoid accidentally breaking any micronaut stuff in unit tests. Also updates our github actions (I think?) to call that task as needed.

Like the point is to prevent connector integration tests from pulling this in? Maybe a clearer/pattern-following name like test-integration-cdk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

welp, my comment on this got hidden #45634 (comment) (tl;dr yes, it populates the paths by default. I should probably just make that an in-code comment 🤔 )

it's more that I'm declaring Singletons without @Requires (b/c I didn't want to add more plumbing into CliRunner/AirbyteConnectorRunner). So putting this stuff into a separate task+classpath (instead of just test) hopefully avoids annoying duplicate bean errors in the cdk unit tests. (connector integration tests wouldn't be affected either way, since they only depend on testFixtures)

so I think integrationTest is the right name? (but also, I don't really feel strongly :P )

  • test -> cdk unit tests
  • testFixtures -> cdk tooling for connector tests
  • integrationTest -> cdk integration tests

Copy link
Contributor

@johnny-schmidt johnny-schmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mock destination looks great, I like how the test world is shaping up.

I'm not sure about the gradle stuff. Makes sense to run it with the bulk publish but maybe not with the check, in case I'm missing something. All the other ITs explicitly run after check.

Also definitely get eyes on it from sources.

@edgao edgao force-pushed the edgao/in_memory_test branch from 9797a8a to 8635353 Compare September 27, 2024 22:07
@edgao edgao force-pushed the edgao/in_memory_test branch from e5ddcf7 to b8aa176 Compare October 1, 2024 19:05
@edgao
Copy link
Contributor Author

edgao commented Oct 3, 2024

talked with @rodireich offline a few days ago and we feel ok with this - @johnny-schmidt can you take another look + approve? (the diff since your last review is just https://github.com/airbytehq/airbyte/pull/45634/files/a718581aaf73fd9d8d5730cdd11988c19a1af413..1454562e7efec83333849e43c539de815e3f7827, i.e. incorporating your idea about ConcurrentHashMap, having check depend on the new task, and getting the build back to green)

@edgao edgao requested a review from johnny-schmidt October 3, 2024 19:31
Copy link
Contributor

@johnny-schmidt johnny-schmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@edgao edgao merged commit 4c680b4 into master Oct 3, 2024
30 checks passed
@edgao edgao deleted the edgao/in_memory_test branch October 3, 2024 23:58
Copy link
Contributor

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review. I only looked at the gradle stuff. I'm just confused by the task dependencies because the intent behind them is not clear to me, other than that everything seems fine.

edit: the mustRunAfter check really tripped me up, but it's fine

@@ -61,6 +61,12 @@ allprojects {
}
}

tasks.register('bulkCdkIntegrationTest').configure {
// findByName returns the task, or null if no such task exists.
// we need this because not all submodules have an integrationTest task.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, alternatively, tasks.matching is also useful for this purpose, perhaps even preferred

CI: true
with:
job-id: bulk-cdk-publish
concurrent: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some read-only flags that you can set here, not sure if they're actually helpful

in any case none of this seems wrong

testClassesDirs = sourceSets.integrationTest.output.classesDirs
classpath = sourceSets.integrationTest.runtimeClasspath
useJUnitPlatform()
mustRunAfter tasks.check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this for? why not have this run as part of check? is this because of the dependency on assemble for the docker image? if it's the latter, it's better to declare that dependency explicitly with dependsOn assemble. These mustRunAfter constraints are typically not what you really want for these kinds of tasks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this really tripped me up; did you in fact mean to have check depend on integrationTest? if so please let me know @edgao

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc I copypasted this from stackoverflow without reading it :P I did in fact want to have check depend on integrationTest

(which I think is what I did later on, with the rootProject.check.dependsOn thing? This mustRunAfter thing probably isn't really needed - probably the assumption on SO was that integrationTest is slow, and therefore only worth running if check succeeded. Which isn't true here.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

FYI chatgpt is great at generating gradle scripts

airbyte-cdk/bulk/core/load/build.gradle Show resolved Hide resolved
@edgao edgao mentioned this pull request Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants