-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(image): resolve scan deadlock when error occurs in slow mode #4336
Conversation
|
Hello @mpoindexter As you said in #4335 i reproduced this case, but i can't find information why |
It's not that the semaphore blocks a channel send directly, the channel send is blocked because it's an unbuffered channel, and nothing is yet reading from the channel. In turn, the goroutine that will eventually be responsible for reading from the channel is blocked because the goroutine trying to send on the channel holds the semaphore which allows at most one holder in slow mode. Using a buffered channel sort of fixes the problem, but for it to be a full fix the buffer size of the channel must be equivalent to max number of errors that could happen, not 1. Otherwise if there are errors on more than one layers the problem can occur. |
Thanks that you explained in more detail and fast answer! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way looks good for me.
Thanks for your work @mpoindexter .
@knqyf263 I approved this PR.
What if putting |
@knqyf263 I moved the |
We may want to use errgroup in this case. |
OK, updated to use errgroup |
if ctx.Err() != nil { | ||
break | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is acceptable to run goroutine even after an error. What if removing this error check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be worse to remove it - there's no correctness problem with removing the check, but inspecting a layer can be quite expensive, so it seems like we should stop doing it if we already know that we're going to get an error result when we call group.Wait()
}() | ||
|
||
layerKey := k | ||
ctx := groupCtx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
groupCtx
is not updated in the loop. Is there any specific reason to overwrite ctx
every time here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it looked cleaner to pass ctx in the body of the goroutine. The code previously took ctx
as an argument to the goroutine, but with errgroup
we can't pass arguments to the goroutine, so just binding some variables was the replacement. We could just change to use groupCtx
within the goroutine, let me know.
Description
See #4335
Related issues
Checklist