discovery: fix syncqueue retries #141

alexbrand · 2018-06-05T15:05:59Z

Updates #134

I propose we open another issue for spec'ing a new metric that will keep track of the number of retries.

The client-go workqueue expects the user to mark any item that needs to
be retried as "dirty". That is, on failure, the item must be re-added to
the queue for reprocessing at a later time.

The item is added to the queue via AddRateLimited, which will add the
item once the rate limiter says it's okay.

Signed-off-by: Alexander Brand <alexbrand09@gmail.com>

stevesloka · 2018-06-05T16:34:15Z

discovery/pkg/sync/queue.go

+
+	// If there was an error handling the item, we will retry up to
+	// queueMaxRetries times.
+	if sq.Workqueue.NumRequeues(obj) < queueMaxRetries {


I think we determined that Forget() didn't do much, but we still call it in the error-retry case but not the happy path.

Forget() does have a function after all. The default rate limiter that we are using has two rate limiters. One of them is the BucketRateLimiter, for which Forget() has an empty implementation. The second is the ItemExponentialFailureRateLimiter, which does perform an operation when calling Forget().

My understanding is that Forget() does not have much to do with the queue itself, but with the rate limiters which are tracking each object in the queue, and determining when they are ready to be retried. By calling Forget(), we tell the rate limiter to stop tracking the object.

Thus, we call Forget() when we no longer want to retry an object. Or, on the flip side, the only place that we don't call Forget() is when we want the rate limiter to continue tracking the object because we are going to retry.

The updated (?) example in client-go talks a bit more about this: https://github.com/kubernetes/client-go/blob/master/examples/workqueue/main.go

stevesloka · 2018-06-05T16:36:06Z

discovery/pkg/sync/queue.go

+	actionAdd       = "add"
+	actionUpdate    = "update"
+	actionDelete    = "delete"
+	queueMaxRetries = 5


5 might be a bit high of a retry count. Maybe split this out and allow it to be configurable?

Making it configurable is probably best. Any suggestions for default num retries? 3?

Yeah maybe 3 is better. @rosskukulinski any thoughts?

I've read through this PR and #134 and I'm not sure I understand this well enough to explain the impact this value might have to a user. What are the tradeoffs of making this 4 or 6?

Is this something we expect users to have to configure frequently? Does it depend on the size of their backend clusters (services?)

The question is, if a service or endpoint fails to insert/update/delete, how many times should the discover retry the operation before giving up. In larger, busier clusters, I could see setting this number lower as it adds to the overhead of the queue size.

I think 3 is a good default, then LGTM.

Ok! I don't see a reason to make this configurable to general users at this time, but if it's helpful for @alexbrand's performance testing/tuning, then I'm +1 for adding configuration at this time.

Otherwise, LGTM for 3 based on @stevesloka's input.

The client-go workqueue expects the user to mark any item that needs to be retried as "dirty". That is, on failure, the item must be re-added to the queue for reprocessing at a later time. The item is added to the queue via AddRateLimited, which will add the item once the rate limiter says it's okay. Signed-off-by: Alexander Brand <alexbrand09@gmail.com>

Signed-off-by: Alexander Brand <alexbrand09@gmail.com>

alexbrand · 2018-06-07T13:08:07Z

Updated the retry count to 3 instead of 5. I think we can make it configurable in the future if someone has a need. I personally don't have a need right now for perf testing. Thoughts?

stevesloka

Small nit otherwise LGTM

stevesloka · 2018-06-07T13:53:00Z

discovery/pkg/sync/queue.go

+	// If there was an error handling the item, we will retry up to
+	// queueMaxRetries times.
+	if sq.Workqueue.NumRequeues(obj) < queueMaxRetries {
+		sq.Logger.Infof("Error handling %s: %v. Requeuing.", action, err)


nit: Should this be Error log type?

Also might be nice to log the retry count.

Signed-off-by: Alexander Brand <alexbrand09@gmail.com>

alexbrand · 2018-06-07T16:55:19Z

Thanks @stevesloka! Fixed the log call. PTAL

alexbrand requested a review from stevesloka June 5, 2018 15:05

stevesloka reviewed Jun 5, 2018

View reviewed changes

alexbrand added 2 commits June 7, 2018 08:58

discovery: update max retries to 3 instead of 5

b8aeb88

Signed-off-by: Alexander Brand <alexbrand09@gmail.com>

alexbrand force-pushed the fix-queue branch from 0e95f66 to b8aeb88 Compare June 7, 2018 13:05

stevesloka reviewed Jun 7, 2018

View reviewed changes

discovery: log number of requeues

2195c69

Signed-off-by: Alexander Brand <alexbrand09@gmail.com>

stevesloka approved these changes Jun 7, 2018

View reviewed changes

stevesloka merged commit cc9ce58 into projectcontour:master Jun 7, 2018

alexbrand mentioned this pull request Jun 7, 2018

Discoverer should not retry deleting a service/endpoint that does not exist #134

Open

alexbrand added this to the v0.3 milestone Jun 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

discovery: fix syncqueue retries #141

discovery: fix syncqueue retries #141

alexbrand commented Jun 5, 2018

stevesloka Jun 5, 2018

alexbrand Jun 5, 2018

stevesloka Jun 5, 2018

stevesloka Jun 5, 2018

alexbrand Jun 5, 2018

stevesloka Jun 5, 2018

rosskukulinski Jun 5, 2018

stevesloka Jun 6, 2018

rosskukulinski Jun 6, 2018

alexbrand commented Jun 7, 2018

stevesloka left a comment

stevesloka Jun 7, 2018

stevesloka Jun 7, 2018

alexbrand commented Jun 7, 2018

discovery: fix syncqueue retries #141

discovery: fix syncqueue retries #141

Conversation

alexbrand commented Jun 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexbrand commented Jun 7, 2018

stevesloka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexbrand commented Jun 7, 2018