Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assertAllParentShardsAreClosed check interacts poorly with high shard count dynamodb streams #210

Closed
zerth opened this issue Sep 14, 2017 · 1 comment

Comments

@zerth
Copy link
Contributor

zerth commented Sep 14, 2017

This assertion during shard sync can prevent worker initialization for workers processing dynamodb streams associated with tables having large numbers of partitions:

assertAllParentShardsAreClosed(shardIdToChildShardIdsMap, shardIdToShardMap);

Example error message:

Sep 13, 2017 4:40:31 PM com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncTask call
SEVERE: Caught exception while sync'ing Kinesis shards and leases
com.amazonaws.services.kinesis.clientlibrary.exceptions.internal.KinesisClientLibIOException: Parent shardId shardId-00000001505307450567-xxxxxxxx is not closed. This can happen due to a race condition between describeStream and a reshard operation.
    at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncer.assertAllParentShardsAreClosed(ShardSyncer.java:161)
    at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncer.syncShardLeases(ShardSyncer.java:117)
    at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncer.checkAndCreateLeasesForNewShards(ShardSyncer.java:88)
    at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncTask.call(ShardSyncTask.java:68)
    at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:49)
    at com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker.initialize(Worker.java:427)
    at com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker.run(Worker.java:356)
    at com.amazonaws.services.kinesis.multilang.MultiLangDaemon.call(MultiLangDaemon.java:111)
    at com.amazonaws.services.kinesis.multilang.MultiLangDaemon.call(MultiLangDaemon.java:58)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

This is with KCL 1.7.5 and dynamodb-streams-kinesis-adapter 1.2.1.

I believe this is caused by pagination of the dynamodb stream description for very large tables taking more than several seconds, by which time a new child shard is likely to have been created (referencing a still-open parent which was seen earlier in the paginated response and thus triggering this assertion).

This commit attempts to fix this against 1.7.5: AdRoll@09dcc99

The attempted fix adds a configuration parameter controlling whether the assertion is made, and also prevents the creation of new leases for such children with still-open parents during shard sync.

@zerth
Copy link
Contributor Author

zerth commented Jan 5, 2018

PR was merged.

@zerth zerth closed this as completed Jan 5, 2018
pfifer added a commit to pfifer/amazon-kinesis-client that referenced this issue Jan 15, 2018
* Allow disabling check for the case where a child shard has an open parent shard.
  There is a race condition where it's possible for the a parent shard
  to appear open, while having child shards. This check can now be
  disabled by setting ignoreUnexpectedChildShards in the
  KinesisClientLibConfiguration to true.
  * PR awslabs#240
  * Issue awslabs#210
* Upgraded the AWS SDK for Java to 1.11.261
  * PR awslabs#281
pfifer added a commit that referenced this issue Jan 15, 2018
* Allow disabling check for the case where a child shard has an open parent shard.
  There is a race condition where it's possible for the a parent shard
  to appear open, while having child shards. This check can now be
  disabled by setting ignoreUnexpectedChildShards in the
  KinesisClientLibConfiguration to true.
  * PR #240
  * Issue #210
* Upgraded the AWS SDK for Java to 1.11.261
  * PR #281
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant