KAFKA-3854: Fix issues with new consumer's subsequent regex (pattern) subscriptions#1572
KAFKA-3854: Fix issues with new consumer's subsequent regex (pattern) subscriptions#1572vahidhashemian wants to merge 1 commit intoapache:trunkfrom
Conversation
There was a problem hiding this comment.
Nitpick: the use of 'confirm' in the name makes it sound like this is just a check, but it actually mutates internal state. Maybe setSubscriptionType would be a little more accurate?
There was a problem hiding this comment.
I was having some doubts when picking this name. The method both sets the state (if not set yet) and verifies it (if set). I'll change it to setSubscriptionType. Thanks.
7503abb to
7446115
Compare
|
@vahidhashemian Thanks for the patch! Would it be much trouble to add an integration test (e.g. in |
|
@hachikuji Sure. I can do that. I had opened KAFKA-3897 to improve regex subscription testing in a separate PR. |
|
That works for me. Another small thing: could we change the title of the PR to reflect the problem you found (i.e. metadata should be refreshed immediately upon changing the subscription regex). |
|
Will do that too. Thanks. |
|
@hachikuji Actually, the PR fixes two issues. Metadata refresh fixes one (item 2 in description), and the SubscriptionType added fixes the other (item 1 in description). Do you think the title you suggested still works? |
|
Good point. In that case, the current title probably works. |
7446115 to
463345b
Compare
|
@hachikuji I added an integration test that covers both reported issues. Please advise if you see an issue. Thanks. |
463345b to
5fb472a
Compare
|
Weird, some tests are randomly failing (first, second, last try), and all of the failures are from I see some other builds, like this and this, also failed similarly. |
5fb472a to
e617d02
Compare
|
@vahidhashemian LGTM overall. I ran the test a few times locally and couldn't reproduce the failure, but jenkins tends to be more erratic. The only thought that comes to mind is that we somehow don't have up-to-date metadata when the consumer joins the group the first time. |
|
@hachikuji Could you please elaborate? Do you see that as a side-effect of this update, or something that still hasn't been addressed? Thanks. |
|
@vahidhashemian I don't see an actual problem, I'm just speculating what could cause the test to timeout. If you can reproduce the failure and see the consumer logs, we should be able to tell what's going on. |
|
Aah! Thanks for clarifying. I'll take a look at the logs. One thing I noticed happening occasionally for me locally, when I was testing my sample Java consumer, was a delay of over 20 seconds between when onPartitionsRevoked() and onPartitionsAssigned() were called. Not sure if that's related or not. |
There was a problem hiding this comment.
@vahidhashemian The patch looks good, I just want to call out one thing that this test doesn't actually cover. We have other tests that validate that when we add topics, after a metadata refresh the new topic is included in the subscription when it matches the regex. This test, by checking for the removal of some topics, already validates that when metadata is updated we'll see the subscription updated to reflect the new regex. However, the cause of the metadata refresh may be either a) the resubscription or b) the normal metadata refresh interval. For (b), to make some other tests fast, the base class sets it to only 100ms. If we want this test to also validate that the resubscription forces an immediate metadata refresh and the subscription change is reflected promptly, I think we'd need to use a consumer with the metadata refresh interval overridden (see a few other tests in here that use custom settings for an example), add another topic before the resubscription (the only thing that will validate metadata changes), make sure it is included in the new regex, and then validate as we are already.
I think this is just some minor tweaking of the test, so hopefully straightforward to validate. A comment explaining what we're trying to make sure is happening would also be helpful as @hachikuji and I just had to spend some time thinking through what one of the other tests was actually evaluating.
There was a problem hiding this comment.
@ewencp Thanks for your feedback and catching the missing test. I'll add the test and the comments as you suggested.
e617d02 to
0860bd7
Compare
|
@ewencp I updated the test to cover what you mentioned. Also added some comments for the three pattern subscription tests. I realized that unsubscription is actually being tested in all three, so perhaps |
… subscriptions This patch fixes two issues: 1. Subsequent regex subscriptions fail with the new consumer. 2. Subsequent regex subscriptions would not actually refresh metadata and change the subscription of the new consumer nor they would trigger a rebalance. The final note on the JIRA stating that a later created topic that matches a consumer's subscription pattern would not be assigned to the consumer upon creation seems to be as designed. A repeat subscribe() to the same pattern would be needed to handle that case. Unit tests for regex subscriptions will be handled in KAFKA-3897.
0860bd7 to
55423ce
Compare
|
LGTM, thanks @vahidhashemian for working through all the comments and improving docs on the existing tests! |
|
@ewencp, seems like we should double-commit this to |
|
@ijuma I'm open to it, but skeptical -- would you consider this a critical fix that must go into 0.10.0.1? It's an unfortunate misbehavior, but seems like an edge case I wouldn't consider critical. Updating regex subscriptions dynamically is not a particularly broad use case and this was only found through testing of a KIP as I understand it, not someone hitting a production issue with the new consumer. All that said, if you want to cherry-pick, I will not object :) |
|
@ewencp It's probably not critical, but it's a clear bug fix, it seemed low risk and it could surprise users in production. I wouldn't call an edge case because it's so easy to trigger (it's not like it requires a complex sequence of operations) even if it may be a bit rare in practice. Do you agree that it's low risk? Happy to cherry-pick, if so. Otherwise, better to leave it indeed. |
… subscriptions This patch fixes two issues: 1. Subsequent regex subscriptions fail with the new consumer. 2. Subsequent regex subscriptions would not immediately refresh metadata to change the subscription of the new consumer and trigger a rebalance. The final note on the JIRA stating that a later created topic that matches a consumer's subscription pattern would not be assigned to the consumer upon creation seems to be as designed. A repeat `subscribe()` to the same pattern or some wait time until the next automatic metadata refresh would handle that case. An integration test was also added to verify these issues are fixed with this PR. Author: Vahid Hashemian <vahidhashemian@us.ibm.com> Reviewers: Jason Gustafson <jason@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io> Closes #1572 from vahidhashemian/KAFKA-3854
|
@ijuma Pushed to 0.10.0, and updated JIRA fix version. We should probably figure out how to distinguish between critical, low-risk, and not, but in this case, agreed I'm not too worried about existing functionality or compatibility. |
|
Thanks! |
|
PlaintextConsumerTest.testPatternUnsubscription and PlaintextConsumerTest.testSubsequentPatternSubscription fail for me on commit 87b3ce with JDK 1.8.0_31. The failure messages are:
All other test cases pass. |
|
@MagnusR Looking at the recent failed builds it seems the issue is intermittent where one or more pattern subscription tests are failing, but there is no one test that always fail. I assume all tests pass locally for you? |
|
@MagnusR Yeah, would be good to know if this is consistent and only this test fails. Because we basically have system integration tests as unit tests, they end up causing transient failures. We've noticed some other tests failing under Jenkins and many seem to be relying on the default timeout from |
|
@vahidhashemian: This was a local run on my laptop. 804 test run, 2 tests failed. Only tried once, so I don't know if it's intermittent or not. |
… subscriptions This patch fixes two issues: 1. Subsequent regex subscriptions fail with the new consumer. 2. Subsequent regex subscriptions would not immediately refresh metadata to change the subscription of the new consumer and trigger a rebalance. The final note on the JIRA stating that a later created topic that matches a consumer's subscription pattern would not be assigned to the consumer upon creation seems to be as designed. A repeat `subscribe()` to the same pattern or some wait time until the next automatic metadata refresh would handle that case. An integration test was also added to verify these issues are fixed with this PR. Author: Vahid Hashemian <vahidhashemian@us.ibm.com> Reviewers: Jason Gustafson <jason@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io> Closes apache#1572 from vahidhashemian/KAFKA-3854
This patch fixes two issues:
The final note on the JIRA stating that a later created topic that matches a consumer's subscription pattern would not be assigned to the consumer upon creation seems to be as designed. A repeat
subscribe()to the same pattern or some wait time until the next automatic metadata refresh would handle that case.An integration test was also added to verify these issues are fixed with this PR.