-
Notifications
You must be signed in to change notification settings - Fork 14.9k
KAFKA-10000: Integration tests (KIP-618) #11782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7dfae7c to
196608f
Compare
|
Converting to draft until upstream PRs are reviewed. |
acb8fb3 to
3d65e79
Compare
9e017b8 to
8756ee2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this line to hide the newly-introduced zombie fencing API from the OpenAPI spec that we started generating with #12067.
c520f26 to
26c6879
Compare
0b0d4c2 to
a751225
Compare
|
Given that all merge conflicts have been resolved and #11781 has already been approved, marking this as ready for review. |
4efd374 to
d322e99
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this check to clean up the flood of ERROR-level log messages that occurs when the task's producer is closed while there are still in-flight messages.
This issue was not specific to these integration tests or to KIP-618, but it clogged up the logs for these tests badly enough that a small tweak in the code base to address it seemed warranted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was way too noisy at INFO-level.
|
@showuon @tombentley @mimaison anybody got time for this one? Just this and #11783 left before everything for KIP-618 is merged! 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the missing Javadoc on this one!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, fair. Added a few paragraphs; hope it's enough
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java
Outdated
Show resolved
Hide resolved
...ime/src/test/java/org/apache/kafka/connect/integration/ExactlyOnceSourceIntegrationTest.java
Outdated
Show resolved
Hide resolved
...ime/src/test/java/org/apache/kafka/connect/integration/ExactlyOnceSourceIntegrationTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really explain why this second call is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, updated to something less snarky.
...ime/src/test/java/org/apache/kafka/connect/integration/ExactlyOnceSourceIntegrationTest.java
Outdated
Show resolved
Hide resolved
8ed7592 to
2f4b1cc
Compare
mimaison
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @C0urante for the PR. I left a few comments. My main concern is about checking the exact equality between what the connector produces and what ends up in the topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got a bit confused looking at this. I wonder if we should make these custom config names as constants and maybe also name them similarly. I think custom.exactly.once.support may be clearer than exactly.once.support.level as I initially did not notice it was not the real configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, did both (constants and renaming property).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm boring but I'm not quite sure if we want to keep the emojis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can live with that 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be fail instead of null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch--yes, it should
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be better to produce a precise number of records and check we get that exact number. Here, if I understand correctly, the connector could have created duplicates and we would not notice it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true; we don't check for duplicates here. However, the test cases don't exactly do a lot to induce them, either. Even if exactly-once support were disabled for these, the conditions that they're run under are still green-path and shouldn't result in any duplicates.
We perform hard and soft rolling bounces in the connect_distributed_test.py::test_exactly_once_source system test introduced in #11783 and check for duplicates and record content there, which seems appropriate if we're aiming to test how resilient this feature is against real-world and sometimes-suboptimal conditions.
We could possibly add a utility method to assert that a collection of records produced by a source connector with n tasks has consecutive, non-duplicated sequence numbers for each unique task, but I'm not sure it's really worth the effort considering how much coverage we already get from the system test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to agree with @mimaison. It shouldn't be that much effort to verify the sequence numbers, and because most people are not in the habit of running the system tests having the assertion would provide much earlier warning of regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually is a surprising amount of work to verify sequence numbers since records can be spread across partitions and, as a result, polled out of order by a consumer. We could reduce the number of topic partitions that we write to in these cases to 1, or place an arbitrary limitation on the number of records produced by each MonitorableSourceTask instance before it stops producing, or shut down the connector and do a poll for the end offsets of the topic and read from beginning up to that point to get absolutely everything that's in the topic. But the first option would not be worth the decrease in coverage IMO and the last two are pretty complex and come with their own edge cases.
I do realize that system tests are pretty uncommon (I hardly run them myself), but do we really think there's risk of a regression that would be caught by these integration tests that wouldn't be caught by all the accompanying unit tests that we have?
If there's a simpler way to do things let me know, too; I could be missing an easy win here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, one long weekend later, I've taken a shot at this. I don't know if I love how much complexity it's introduced into the tests here since it's unfortunately non-trivial to do a "read all" for a Kafka topic, but at least if the newly-introduced EmbeddedKafkaCluster::consumeAll method is accurate, we can reuse that logic elsewhere in our tests and not have to worry about solving this problem more than once.
There's still no assertions for in-order delivery, but we now have checks to make sure that no records are dropped or duplicated. LMKWYT
connect/runtime/src/test/java/org/apache/kafka/connect/util/clusters/EmbeddedKafkaCluster.java
Outdated
Show resolved
Hide resolved
...ime/src/test/java/org/apache/kafka/connect/integration/ExactlyOnceSourceIntegrationTest.java
Outdated
Show resolved
Hide resolved
533eae1 to
bba0d8e
Compare
tombentley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple more comments, but otherwise this LGTM. Thanks!
...ime/src/test/java/org/apache/kafka/connect/integration/ExactlyOnceSourceIntegrationTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to agree with @mimaison. It shouldn't be that much effort to verify the sequence numbers, and because most people are not in the habit of running the system tests having the assertion would provide much earlier warning of regressions.
bba0d8e to
a6eb895
Compare
showuon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a pass. LGTM. Left some minor comments. Thanks.
...ime/src/test/java/org/apache/kafka/connect/integration/ExactlyOnceSourceIntegrationTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we mention something about Fibonacci numbers in comment here? I think it's not obvious what we're trying to achieve here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, done 👍
a6eb895 to
c267490
Compare
c267490 to
54e443e
Compare
showuon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the update.
|
Failed tests are unrelated: |
Implements embedded end-to-end integration tests for KIP-618, and brings together previously-decoupled logic from upstream PRs.
Relies on changes from: