Skip to content
This repository has been archived by the owner on Oct 23, 2023. It is now read-only.

Scaling Kinesis Shards Yields Errors #37

Closed
etspaceman opened this issue Nov 20, 2017 · 4 comments · Fixed by #79
Closed

Scaling Kinesis Shards Yields Errors #37

etspaceman opened this issue Nov 20, 2017 · 4 comments · Fixed by #79

Comments

@etspaceman
Copy link
Contributor

See this thread on the AWS Developer forums:

https://forums.aws.amazon.com/thread.jspa?threadID=245127

Basically to recreate this, I bump up the shard count for Kinesis. With a single node, I see the above error occur, and the consumer dies. We bounce our application and scale to 2 nodes, but one of the nodes tries to connect to the initial shard (which is now in a "CLOSED" state), and throws the same error. We've also tried removing the checkpointing and then restarting our applications, with the same issue occurring. The only way we're able to consume all shards is to have N+1 instances of a consumer against the stream shards.

@etspaceman
Copy link
Contributor Author

Any thoughts on this one @markglh? It effectively had us drop our consumer from our application. :/

@markglh
Copy link
Contributor

markglh commented Dec 21, 2017

I need to recreate this in an int test @etspaceman - looks like a KCL bug but would be interesting to see if we can get around it.

@etspaceman
Copy link
Contributor Author

I talked to @markglh about this today. I think we've found the source of the issue. See:

awslabs/amazon-kinesis-client#211

According to this, the exception can be thrown if we are checkpointing using sequence numbers entirely. This exception is also specific to shard-end events, which is described by the TERMINATE shutdown reason (the input to `shutdownRequested will have this value).

https://github.com/WW-Digital/reactive-kinesis/blob/master/src/main/scala/com/weightwatchers/reactive/kinesis/consumer/ConsumerProcessingManager.scala#L143

If we change the above to include a call a checkpointer.checkpoint() in the instances of shard-end events, we should be able to properly avoid the error.

@agaro1121
Copy link
Contributor

That makes sense. Scaling kinesis involves shutting down a shard to replace with 2 others. If we don't checkpoint at the end of the old shard, then kinesis flags it as being in a inconsistent state.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants