Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for retrying messages within replicator processor #827

Conversation

samarabbas
Copy link
Contributor

Kafka consumer for replication of events now only relies on actual topic
it is consuming from and DLQ for messages which cannot be processed due
to bugs in the replication stack. We no longer uses retry queue for the
configured consumer.

Replication message processing will now infinitely sit in the loop of
transient errors in processing the message. In the event where
processing logic returns RetryTaskError, it will retry that error few
times before moving the message to DLQ.

Also added a bunch of new metric to help with debugging replication
related issues.

@samarabbas samarabbas requested a review from wxing1292 June 7, 2018 23:20
case *shared.RetryTaskError:
p.metricsClient.IncCounter(scope, metrics.CadenceErrRetryTaskCounter)
case *yarpcerrors.Status:
if err.Code() == yarpcerrors.CodeDeadlineExceeded {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context deadline?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok that is actually correct

Kafka consumer for replication of events now only relies on actual topic
it is consuming from and DLQ for messages which cannot be processed due
to bugs in the replication stack.  We no longer uses retry queue for the
configured consumer.

Replication message processing will now infinitely sit in the loop of
transient errors in processing the message.  In the event where
processing logic returns RetryTaskError, it will retry that error few
times before moving the message to DLQ.

Also added a bunch of new metric to help with debugging replication
related issues.
@samarabbas samarabbas force-pushed the retry-queue-support-for-replicator branch from 2fa2455 to cf9fdc2 Compare June 9, 2018 01:15
@samarabbas samarabbas merged commit 0fecb80 into cadence-workflow:master Jun 10, 2018
@samarabbas samarabbas deleted the retry-queue-support-for-replicator branch June 10, 2018 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants