-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix possible race condition on request ctx done #1662
Fix possible race condition on request ctx done #1662
Conversation
…ot assign it to nil in order to avoid a race condition
The data race isn't with closing the channel. IIRC, it's with a struct field that holds a reference to a custom IMO storing a context in a struct field is a design smell and discouraged. From https://pkg.go.dev/context, they explicitly warn against it:
|
No but after closing the channel, it was set to nil to prevent closing it again I think. This is one possible source of race conditions. if you look at the screenshot, there is a write on server.go line 1916 and read on server.go line 2728 by adding an once function, we can stop the original writing by never set this to nil. however, this is not the end of story, and this is more deeply, linked to design itself, etc. Perhaps the adaptation of |
Maybe this Fix #1663 |
close(s.done) | ||
s.closeDone.Do(func() { | ||
close(s.done) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to reset s.closeDone
when s.done
is reset. This change now makes it impossible to reuse the Server struct after closing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see, and since you want to reuse the Server struct, you may create the s.done again
I must think about this, because if I reset s.closeDone we may ended with the same code and same behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mean here exactly. You have to reset it when a Serve is called I think. That way you don't have the same behaviour anymore, only when you reuse the Server struct and then it's intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erikdubbelboer @peczenyj I think it can be modified like this
func (ctx *RequestCtx) Done() <-chan struct{} {
//fix Use locks to prevent concurrent modifications,
//and use new variables to prevent panic caused by modifying the original done chan to nil.
ctx.s.mu.Lock()
defer ctx.s.mu.Unlock()
if ctx.s.done == nil {
tmp := make(chan struct{}, 1)
tmp <- struct{}{}
return tmp
}
doneChan := ctx.s.done
return doneChan
}
func (ctx *RequestCtx) Err() error {
select {
case <-ctx.Done(): // //fix Use unified functions instead of reference variables to converge fetching into one place
return context.Canceled
default:
return nil
}
}
Hey, we also receive sometimes a panic in our services with the following stacktrace, could this be related to this issue?
|
I understand that the channel may be set to nil to prevent a panic, but is it really valid to call Shutdown twice in the first place? Couldn't it instead be stated that users shall not call Shutdown more than once? Then the race could be avoided. |
server.go
Outdated
done chan struct{} | ||
|
||
done chan struct{} | ||
closeDone sync.Once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think atomic
is better than sync.Once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should I use atomic in this context? Like to store the state (open / closed)?
Hello
Few days ago I find this tweet about fasthttp and go-fiber:
https://twitter.com/davidnix_/status/1720454052973044188?s=48
I try to contact davidnix_ to find more information about this, since I use fasthttp on my projects and race condtion is something that I'd like to avoid in a production environment.
He did not provide me any runnable example, I think he delete or lost the original code, but I ask him to add an issue anyway - since I try to reproduce this issue without success.
Based on the screenshots, I think I find the root cause: we assign the done channel to nil and if we are using it in some other goroutine such as using
Done()
method, this may trigger the race condition, but I can't write a unit test that fails with or without the race condition detector.If @DavidNix himself wants to add some contribution I think it will be helpful
Regards