-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefetching topic metadata on newly created topic interferes with consumer (?) #194
Comments
Hi @apeloquin-agilysys , could you tell me the cause for the failure (was it a timeout at the I'm getting that error when I use mocha to run this test on a 3-broker local cluster. The cause for this is because, after the topic is created, it takes some time for the topic to be propagated in the metadata on all brokers. So by the time Since we check for metadata on a 1s interval timer, the test times out (the default timeout for mocha seems to be 2s). The second test doesn't face the same issue because the topic is already created and propagated by this time. I confirmed this by changing the order of the tests,
and now it's the first one which fails. To fix this you can add a small sleep after topic creation, or use something like await until(async () => {
/* If we get anything other than an error, we're good */
return admin.fetchTopicMetadata({topics: [TOPIC]})
.then(() => true)
.catch(e => {
if (e.code === RdKafka.CODES.ERRORS.ERR_UNKNOWN_TOPIC_OR_PART) {
return false;
}
throw e;
});
}); just after topic creation (I've changed the until function for async conditions) |
@milindl In our case, the failure is not waiting for the consumer to be ready, but rather waiting for the message to be received, i.e. We are using a 15-second timeout when running these tests. ( We're running this test against a single broker local cluster. The consumer starting is not the issue. Consumer starts, but never receives the message -- even though the message is clearly added in the topic. Reordering the tests as you suggested does not result in the first test failing; so I'm pretty confident it's the timeout that is preventing you from reproducing the same behavior we see. |
Okay, with that timeout increase, I was able to reproduce this. I enabled debug logs to check the reason. A consumer does the following:
Because the prefetch speeds up the produce quite a bit, step 2. is not completed by the time the message is produced. So when we actually resolve the offsets and decide to consume from the "next possible message", it doesn't consume the message we've produced. The test in step 3 doesn't fail similarly because it doesn't matter when exactly we produce the message, since we're going to consume from the offsets stored for the consumer group by test 2. To make this particular test suite work, you could add this to the consumer config: consumer = kafka.consumer({
kafkaJS: {
groupId: GROUP_ID,
+ fromBeginning: true, (For our tests, too, we generally set it to true as that prevents these sort of cases where the producer gets ahead of the consumer). |
Environment Information
Steps to Reproduce
While incorporating use of the producer dependentAdmin() and prefetching topic metadata we ran into an issue where a simple produce/consume test was consistently failing in our build pipeline but was not reproducible locally on our dev machines.
Our build pipeline always starts with a fresh Kafka docker instance when running integration tests and this was the key differentiation that allowed us to narrow down the issue.
The test below uses a topic name with a timestamp component to ensure that each run is for a new topic, as this issue does not replicate with multiple runs using the same topic.
There are three tests:
Test results:
Note that in the failing test, the call to fetchTopicMetadata does not result in an error, but the consumer appears to never receive the message, although we do see in Kafka that the message was sent.
The text was updated successfully, but these errors were encountered: