Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix possible race condition on request ctx done #1662

Closed

Conversation

peczenyj
Copy link
Contributor

@peczenyj peczenyj commented Nov 16, 2023

Hello

Few days ago I find this tweet about fasthttp and go-fiber:

https://twitter.com/davidnix_/status/1720454052973044188?s=48

I try to contact davidnix_ to find more information about this, since I use fasthttp on my projects and race condtion is something that I'd like to avoid in a production environment.

He did not provide me any runnable example, I think he delete or lost the original code, but I ask him to add an issue anyway - since I try to reproduce this issue without success.

Based on the screenshots, I think I find the root cause: we assign the done channel to nil and if we are using it in some other goroutine such as using Done() method, this may trigger the race condition, but I can't write a unit test that fails with or without the race condition detector.

If @DavidNix himself wants to add some contribution I think it will be helpful

Regards

@peczenyj peczenyj changed the title Fix race condition on request ctx done Fix possible race condition on request ctx done Nov 16, 2023
@DavidNix
Copy link

DavidNix commented Nov 16, 2023

The data race isn't with closing the channel. IIRC, it's with a struct field that holds a reference to a custom context.Context. I'll try to reproduce it in a few days.

IMO storing a context in a struct field is a design smell and discouraged.

From https://pkg.go.dev/context, they explicitly warn against it:

Do not store Contexts inside a struct type; instead, pass a Context explicitly to each function that needs it.

@peczenyj
Copy link
Contributor Author

The data race isn't with closing the channel.

No but after closing the channel, it was set to nil to prevent closing it again I think.

This is one possible source of race conditions.

if you look at the screenshot, there is a write on server.go line 1916 and read on server.go line 2728

by adding an once function, we can stop the original writing by never set this to nil.

however, this is not the end of story, and this is more deeply, linked to design itself, etc. Perhaps the adaptation of *fasthttp.RequestCtx to be a context.Context had some unexpected
side effects, etc.

@peczenyj
Copy link
Contributor Author

Maybe this Fix #1663

server.go Outdated Show resolved Hide resolved
close(s.done)
s.closeDone.Do(func() {
close(s.done)
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to reset s.closeDone when s.done is reset. This change now makes it impossible to reuse the Server struct after closing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see, and since you want to reuse the Server struct, you may create the s.done again

I must think about this, because if I reset s.closeDone we may ended with the same code and same behaviour

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean here exactly. You have to reset it when a Serve is called I think. That way you don't have the same behaviour anymore, only when you reuse the Server struct and then it's intended.

Copy link
Contributor

@byte0o byte0o Jun 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erikdubbelboer @peczenyj I think it can be modified like this

func (ctx *RequestCtx) Done() <-chan struct{} {
	//fix  Use locks to prevent concurrent modifications, 
	//and use new variables to prevent panic caused by modifying the original done chan to nil.
	ctx.s.mu.Lock()
	defer ctx.s.mu.Unlock()
	if ctx.s.done == nil {
		tmp := make(chan struct{}, 1)
		tmp <- struct{}{}
		return tmp
	}
	doneChan := ctx.s.done
	return doneChan
}
func (ctx *RequestCtx) Err() error {
	select {
	case <-ctx.Done(): // //fix  Use unified functions instead of reference variables to converge fetching into one place
		return context.Canceled
	default:
		return nil
	}
}

@MaxBreida
Copy link

Hey, we also receive sometimes a panic in our services with the following stacktrace, could this be related to this issue?

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x560 pc=0xa5d08d]
                                                                                                                                                                                                                                                                                                                                                                      
goroutine 5095 [running]:
github.com/valyala/fasthttp.(*RequestCtx).Done(...)
    /home/vsts/go/pkg/mod/github.com/valyala/fasthttp@v1.50.0/server.go:2728
context.(*cancelCtx).propagateCancel.func2()
    /opt/hostedtoolcache/go/1.21.4/x64/src/context/context.go:506 +0x44
created by context.(*cancelCtx).propagateCancel in goroutine 3836
    /opt/hostedtoolcache/go/1.21.4/x64/src/context/context.go:504 +0x395

@nickajacks1
Copy link
Contributor

I understand that the channel may be set to nil to prevent a panic, but is it really valid to call Shutdown twice in the first place? Couldn't it instead be stated that users shall not call Shutdown more than once? Then the race could be avoided.

server.go Outdated
done chan struct{}

done chan struct{}
closeDone sync.Once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think atomic is better than sync.Once

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should I use atomic in this context? Like to store the state (open / closed)?

byte0o added a commit to byte0o/fasthttp that referenced this pull request Jul 16, 2024
byte0o added a commit to byte0o/fasthttp that referenced this pull request Jul 16, 2024
erikdubbelboer pushed a commit that referenced this pull request Jul 23, 2024
* Fix possible race condition on request ctx done #1662

* Fix possible race condition on request ctx done #1662

* Fix Comment

* fix remove the use of lock

* fix remove Comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants