Skip to content

Conversation

@srdo
Copy link
Contributor

@srdo srdo commented Aug 15, 2017

…that were already committed. Expand tests, add some runtime validation, minor refactoring to increase code readability. Ensure OffsetManager commits as many offsets as possible when an offset void (deleted offsets) occurs, rather than just up to the gap.

See https://issues.apache.org/jira/browse/STORM-2666. I suspect this issue was also the root cause of the double acks we saw in #1679.

The following changes are made here:

  • Make OffsetManager commit as many offsets as possible when there is a gap in emitted offsets due to offsets being deleted from Kafka. We used to have to do two commits to get past a gap, because the OffsetManager would stop in findNextCommitOffset at the last offset before the gap, commit up to the gap, and then have to do another round to commit the acked tuples past the gap. It should now just pick the highest acked tuple to commit immediately, as long as the previous unacked tuples were not emitted.
  • Fix case where the KafkaConsumer position could fall behind the committed offset. This can happen in some cases where there are a lot of acked uncommitted tuples, and an older tuple is preventing commit because it needs to be retried. Once the failed tuple is retried, all the acked tuples are committed, but the consumer position isn't necessarily caught up.
  • Don't seek the consumer on partition reassignment for partitions that were previously assigned. Since the spout keeps the emitted/acked state for those partitions, we shouldn't be moving the consumer offset.
  • Minor changes to the retry service, mainly stopping iterations once it has found what it was looking for.
  • Add in some Validate calls to ensure that the spout state is good, e.g. check that the spout doesn't try to emit tuples that are already committed.
  • Add more tests

@srdo
Copy link
Contributor Author

srdo commented Aug 15, 2017

Test failure is the integration test and windowing, looks unrelated.

@hmcl
Copy link
Contributor

hmcl commented Aug 15, 2017

@srdo can you please update the JIRA ticket, thanks.

@srdo
Copy link
Contributor Author

srdo commented Aug 16, 2017

@hmcl Updated the ticket (I hope this was what you meant?)

Copy link
Contributor

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good to me. +1

@revans2
Copy link
Contributor

revans2 commented Aug 29, 2017

@hmcl do you have any concerns about the patch?

…that were already committed. Expand tests, add some runtime validation, minor refactoring to increase code readability. Ensure OffsetManager commits as many offsets as possible when an offset void (deleted offsets) occurs, rather than just up to the gap.
@srdo
Copy link
Contributor Author

srdo commented Oct 2, 2017

@hmcl Will merge this in the next few days unless you have concerns.

@srdo srdo mentioned this pull request Oct 4, 2017
@asfgit asfgit merged commit cef7c92 into apache:master Oct 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants