-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing shards after Kinesis stream resharding #339
Comments
The missing shards are not the ones get reported by the KCL in the log messages |
This sounds very similar to what we see from time to time. We've seen it correlate to resharding but also a random instance will stop processing all of the shards it holds leases to until a restart. Check out this open issue for more: #185 |
Are you using the Java KCL or the Multilang KCL? The messages in the logs that you are seeing are warning messages coming from ProcessTask and KinesisProxy. They wouldn't cause your the ShardConsumer to be blocked. |
@sahilpalvia We are using Java KCL. We don't see any other messages that indicates the ShardConsumer being stuck other than those two messages flooding the log files. |
The message you're seeing is from code that handles KPL messages. When you redeployed the KCL ran ListShards again, which fixed the KPL messages. The fact that the KPL messages where not clearing up is somewhat worrying, that should resolve once the KCL gets a full shard map. To add to #185 there is one thing to remember: The lease renewer doesn't check that the record processor is working or making progress. This is partially due to the lease renewer doesn't know how long the record processor could block. |
We see the same affect here. Some of the shards will stop processing at what feels like random times but also when we reshard. The KCL seems like it does not want to process certain shards when it mistakenly loses a lease. We have a 64 shard stream |
I'm seeing this with KCL on dynamo streams as well, version 1.13.0. the logs i have come in 3 flavors:
When it happens, we have to restart the services using KCL on the dynamo stream to make the errors stop. Since it's generating a ton of log traffic, it's kind of an expensive error. |
We scaled up our Kinesis stream in us-east-1 from 340 shards to 435 shards yesterday. We kept our KCL service running throughout the resharding. After the previous open shards are closed, we found that the KCL service is not processing 5 of the new shards.
In the logs we can see a lot of messages like:
"Cannot get the shard for this ProcessTask, so duplicate KPL user records in the event of resharding will not be dropped during deaggregation of Amazon Kinesis records"
as well as
"Cannot find the shard given the shardId shardId-000000003769"
the shards it report cannot find are some of the new shards, not the old, closed ones. I've redeploy the KCL service and now it can find all the shards.
The version of the KCL we are running on is 1.9.1
The text was updated successfully, but these errors were encountered: