-
Notifications
You must be signed in to change notification settings - Fork 92
Conversation
discovery/pkg/sync/queue.go
Outdated
|
||
// If there was an error handling the item, we will retry up to | ||
// queueMaxRetries times. | ||
if sq.Workqueue.NumRequeues(obj) < queueMaxRetries { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we determined that Forget()
didn't do much, but we still call it in the error-retry case but not the happy path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget()
does have a function after all. The default rate limiter that we are using has two rate limiters. One of them is the BucketRateLimiter
, for which Forget()
has an empty implementation. The second is the ItemExponentialFailureRateLimiter
, which does perform an operation when calling Forget()
.
My understanding is that Forget()
does not have much to do with the queue itself, but with the rate limiters which are tracking each object in the queue, and determining when they are ready to be retried. By calling Forget()
, we tell the rate limiter to stop tracking the object.
Thus, we call Forget()
when we no longer want to retry an object. Or, on the flip side, the only place that we don't call Forget()
is when we want the rate limiter to continue tracking the object because we are going to retry.
The updated (?) example in client-go talks a bit more about this: https://github.com/kubernetes/client-go/blob/master/examples/workqueue/main.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
discovery/pkg/sync/queue.go
Outdated
actionAdd = "add" | ||
actionUpdate = "update" | ||
actionDelete = "delete" | ||
queueMaxRetries = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 might be a bit high of a retry count. Maybe split this out and allow it to be configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making it configurable is probably best. Any suggestions for default num retries? 3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah maybe 3 is better. @rosskukulinski any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've read through this PR and #134 and I'm not sure I understand this well enough to explain the impact this value might have to a user. What are the tradeoffs of making this 4 or 6?
Is this something we expect users to have to configure frequently? Does it depend on the size of their backend clusters (services?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is, if a service or endpoint fails to insert/update/delete, how many times should the discover retry the operation before giving up. In larger, busier clusters, I could see setting this number lower as it adds to the overhead of the queue size.
I think 3 is a good default, then LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! I don't see a reason to make this configurable to general users at this time, but if it's helpful for @alexbrand's performance testing/tuning, then I'm +1 for adding configuration at this time.
Otherwise, LGTM for 3 based on @stevesloka's input.
The client-go workqueue expects the user to mark any item that needs to be retried as "dirty". That is, on failure, the item must be re-added to the queue for reprocessing at a later time. The item is added to the queue via AddRateLimited, which will add the item once the rate limiter says it's okay. Signed-off-by: Alexander Brand <alexbrand09@gmail.com>
Signed-off-by: Alexander Brand <alexbrand09@gmail.com>
Updated the retry count to 3 instead of 5. I think we can make it configurable in the future if someone has a need. I personally don't have a need right now for perf testing. Thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit otherwise LGTM
discovery/pkg/sync/queue.go
Outdated
// If there was an error handling the item, we will retry up to | ||
// queueMaxRetries times. | ||
if sq.Workqueue.NumRequeues(obj) < queueMaxRetries { | ||
sq.Logger.Infof("Error handling %s: %v. Requeuing.", action, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Should this be Error
log type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also might be nice to log the retry count.
Signed-off-by: Alexander Brand <alexbrand09@gmail.com>
Thanks @stevesloka! Fixed the log call. PTAL |
Updates #134
I propose we open another issue for spec'ing a new metric that will keep track of the number of retries.