Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KinesisClientLibIOException: Incomplete shard list: Closed shard X has no children #164

Closed
phutwo opened this issue May 4, 2017 · 13 comments

Comments

@phutwo
Copy link

phutwo commented May 4, 2017

We're encountering this error on multiple different kinesis streams/dynamodb-streams which we believe stops our workers. We are using the 1.7.4 libraries.

ERROR [2017-05-04 00:03:39,903] [RecordProcessor-0013] c.a.s.k.c.lib.worker.ShutdownTask: Caught exception: 
com.amazonaws.services.kinesis.clientlibrary.exceptions.internal.KinesisClientLibIOException: Incomplete shard list: Closed shard shardId-00000001493840079292-f265b851 has no children.This can happen if we constructed the list of shards  while a reshard operation was in progress.
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncer.assertClosedShardsAreCoveredOrAbsent(ShardSyncer.java:212)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncer.cleanupLeasesOfFinishedShards(ShardSyncer.java:652)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncer.syncShardLeases(ShardSyncer.java:141)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncer.checkAndCreateLeasesForNewShards(ShardSyncer.java:88)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShutdownTask.call(ShutdownTask.java:122)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:49)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:24)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
@ieiayaobb
Copy link

same issue here for recent days

@pfifer
Copy link
Contributor

pfifer commented May 16, 2017

Does this occur only on DynamoDB Streams streams, or are you seeing this on Kinesis streams as well?

Also to assist in investigating this can you open a thread on the AWS Kinesis Forums

Thanks

@cschellenger
Copy link

We only see it on DynamoDB streams, not manually created streams.

@pfifer
Copy link
Contributor

pfifer commented May 18, 2017

I've asked someone from the DynamoDB team to investigate this. To help us investigate can you start a thread on the AWS Forums, and link the thread back there. Our forums give us some additional tools that will help us investigate.

@cschellenger
Copy link

See this forum thread: https://forums.aws.amazon.com/thread.jspa?threadID=256122

@joelittlejohn
Copy link

Looks like the forum post has gone cold. @cschellenger @phutwo Did you ever get to the bottom of this and/or hear more from Amazon?

@cschellenger
Copy link

No, we've reverted to the 1.6.x release.

@AadithyaU
Copy link

Any solution for this?

@MohamedFaramawi
Copy link

I'm facing same issue.

@MarekMalevic
Copy link

We are still facing the issue, any solution for this already?

@bhanup
Copy link

bhanup commented May 7, 2018

We are also facing the same issue. Any solution for this. Don't see any update on the forum as well.

@pfifer
Copy link
Contributor

pfifer commented May 7, 2018

The DynamoDB team is working on a solution for their plugin to the KCL, but I don't have an ETA on that. A somewhat recent PR #240 added support to ignore these child shards when reading shards in. If you ignore these shards it should be generally OK as the shard that triggered the scan isn't the one that isn't closed, but has open child shards.

@Mentis
Copy link

Mentis commented May 16, 2018

@bhanup @pfifer I've been fighting exactly same issue recently and managed to "workaround/fix" it by forking KCL and adding a 1 second sleep after reaching SHARD_END in the ProcessTask (line 155 in kcl 1.9.2)". It seems that it takes time for DynamoDB to create a new child shard.

Since then there was no occurrence of this issue in production.

ebartkus pushed a commit to trustpilot/kafka-connect-dynamodb that referenced this issue Jul 2, 2019
IgorKowalczyk28xf added a commit to IgorKowalczyk28xf/trustpilotk that referenced this issue Dec 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests