-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce raciness in test #1996
Reduce raciness in test #1996
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So just a bit more context, this is the result of investigating a CI failure of an unrelated change.
It seems that CI is very much resource constrained and race conditions show up more often.
func (h *baseHandler) fetchSegsFromDBRetry(ctx context.Context,
params *query.Params) ([]*seg.PathSegment, error) {
for {
upSegs, err := h.fetchSegsFromDB(ctx, params)
if err != nil || len(upSegs) > 0 {
return upSegs, err
}
select {
case <-ctx.Done():
return nil, ctx.Err()
case <-time.After(h.retryInt):
// retry
}
}
}
We reasoned that on each select evaluation, a timer/channel is created which spawns its own go routine. Thus, it is basically a race between the timer and the current go routines. If the current go routine does not run enough to evaluate the select cases, it is theoretically possible that the timer go routine did run and expired, which ends up in both cases being true and one of them randomly executed.
Reviewed 1 of 1 files at r1.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @kormat)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @kormat)
64def34
to
295c736
Compare
fetchSegsFromDBRetry select on ctx.Done() and time.After(). If the setup/calling of ctx.Done() takes more than what we pass in time.After, it can be that both channels are ready at the same time and then the test might fail. By increasing the timeout in the test this should no longer be a problem.
295c736
to
edc2d12
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @kormat)
fetchSegsFromDBRetry select on ctx.Done() and time.After().
If the setup/calling of ctx.Done() takes more than what we pass in tim.After,
it can be that both channels are ready at the same time and then the test might fail.
By increasing the timeout in the test this should no longer be a problem.
This change is