-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop receiving records with error "Last request was dispatched at...but no response as of ...Cancelling subscription, and restarting." (KCL 2.0) #448
Comments
I have seen this behavior as well when using I was able to consistently reproduce the issue with a To workaround this I configured the pre-fetcher with a much larger max queue size and slightly higher idle time: // Example in Groovy
def pollingConfig = new PollingConfig(streamName, kinesisClient)
pollingConfig.recordsFetcherFactory().maxPendingProcessRecordsInput(MAX_PENDING_PROCESS_RECORDS_INPUT)
pollingConfig.recordsFetcherFactory().idleMillisBetweenCalls(WAIT_TIME_BETWEEN_RECORD_POLLS.toMillis()) The workaround should make the issue much less likely if record processing is slow occasionally. It would probably only delay the issue if record processing is consistently slow. |
Sorry for the delayed response. This is a bug that we are investigating the fix for. |
Adding new items to the receive queue for the PrefetchRecordsPublisher when at capacity would deadlock retrievals as it was already holding a lock on this. The method addArrivedRecordsInput did not need to be synchronized on this as it didn't change any of the protected state (requestedResponses). There is a call to drainQueueForRequests immediately after the addArrivedRecordsInput that will ensure newly arrived data is dispatched. This fixes awslabs#448
* Remove a possible deadlock on polling queue fill Adding new items to the receive queue for the PrefetchRecordsPublisher when at capacity would deadlock retrievals as it was already holding a lock on this. The method addArrivedRecordsInput did not need to be synchronized on this as it didn't change any of the protected state (requestedResponses). There is a call to drainQueueForRequests immediately after the addArrivedRecordsInput that will ensure newly arrived data is dispatched. This fixes #448 * Small fix on the reasoning comment * Adjust the test to act more like the ShardConsumer The ShardConsuemr, which is the principal user of the PrefetchRecordsPublisher, uses RxJava to consume from publisher. This test uses RxJava to consume, and notifies the test thread once MAX_ITEMS * 3 have been received. This ensures that we cycle through the queue at least 3 times. * Removed the upper limit on the retrievals The way RxJava's request management makes it possible that more requests than we might expect can happen.
Any idea when 2.0.5 will be released? Running in to this issue with 2.0.4. |
It will be released soon |
We are using 2.0.5 and still seeing these error messages in the logs |
|
I have a Kinesis stream of 2 shards with data published to it continuously. I use KCL 2.0.1 java to connect to the stream with polling (by populating retrievalConfig.retrievalSpecificConfig with a PollingConfig object). It works completely fine and keeps receiving messages from both shards for the first 10 minutes. After that, it stops receiving any message, even there is data continuously published to the stream. I leave the process running for 5 more minutes and issue persists. After that I restart the process, and it starts receiving messages again from both shards, but stops receiving messages again after running for 10 minutes. Issue happens repeatedly.
No throttling error is seen in logs. Instead, following errors are seen in logs:
2018-10-19 14:12:43,531 ERROR [main] shardId-000000000000: Last request was dispatched at 2018-10-19T03:12:07.772Z, but no response as of 2018-10-19T03:12:43.531Z (PT35.759S). Cancelling subscription, and restarting.
This kind of logs appear once for every 35 seconds for each shard. When it first appeared, it happened to shardId-000000000000, and no more messages are received from this shard. Then it appeared for shardId-000000000001 as well, and no more message is received from this shard.
To isolate the publishing factor, I've done another test where I first published lots of data to the stream without consuming. Then I stop publishing and start the consumer application. Same behaviours are observed.
Same behaviours are observed with KCL 2.0.3.
I've extracted and attached the relevant application logs and error logs for reference.
app.log
error.log
Any idea?
The text was updated successfully, but these errors were encountered: