Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forceLock hanging in v1.8.3 and causing a goroutine leak #205

Closed
jacobalberty opened this issue Feb 27, 2020 · 11 comments
Closed

forceLock hanging in v1.8.3 and causing a goroutine leak #205

jacobalberty opened this issue Feb 27, 2020 · 11 comments
Labels

Comments

@jacobalberty
Copy link

I've only just noticed this and have only been able to confirm it was introduced in v1.8.3 I have not been able to determine if it's a mistake in my using of the library or even make a good test case yet.

upgrading to v1.8.3 has introduced an issue where I end up with a bunch of goroutines that seem to be blocking in forceLock in conn_notjs.go. The following is my pprof output

48 @ 0x436c00 0x4050bd 0x404e85 0x9f2324 0x9f2302 0x9f7ed7 0x466611
#	0x9f2323	nhooyr.io/websocket.(*mu).forceLock+0x43	nhooyr.io/websocket@v1.8.3/conn_notjs.go:234
#	0x9f2301	nhooyr.io/websocket.(*msgReader).close+0x21	nhooyr.io/websocket@v1.8.3/read.go:112
#	0x9f7ed6	nhooyr.io/websocket.(*Conn).close.func1+0x46	nhooyr.io/websocket@v1.8.3/conn_notjs.go:144
@jacobalberty jacobalberty changed the title forceLock hanging in v1.8.3 forceLock hanging in v1.8.3 and causing a goroutine leak Feb 27, 2020
@nhooyr
Copy link
Contributor

nhooyr commented Feb 27, 2020

Hmm, will take a look.

This function is trying to lock the read channel so that it can add back the bufio.Reader into the pool. It looks like there is another goroutine with the lock that hasn't let it go yet or there's a bug where the lock is left acquired.

@jacobalberty
Copy link
Author

I'm in the middle of refactoring the a bunch of code including the code around my websocket stuff. I planned to do the websocket stuff last and take a closer look then to see if I can make a good test case.

Whole reason I'm refactoring is to make it easy to add test cases and have them get reported further back for watchdogs to report to me and heal the service, so good time to run into an issue like this lol.

@nhooyr
Copy link
Contributor

nhooyr commented Feb 27, 2020

Sounds good, def lemme know once you can reliably reproduce.

@nhooyr nhooyr added the bug label Feb 27, 2020
@nhooyr
Copy link
Contributor

nhooyr commented Feb 27, 2020

Does v1.8.2 work fine?

@nhooyr
Copy link
Contributor

nhooyr commented Feb 27, 2020

Also, how do you know they're blocking via the pprof?

I took a look at the code and every codepath that acquires the lock also defers an unlock it so it shouldn't be possible to lock the read lock but then never unlock it.

@nhooyr
Copy link
Contributor

nhooyr commented Feb 27, 2020

Found it.

@nhooyr nhooyr closed this as completed in deb14cf Feb 27, 2020
@nhooyr
Copy link
Contributor

nhooyr commented Feb 27, 2020

Please test master and lmk if it fixes things :)

@jacobalberty
Copy link
Author

Testing now on master, and yes it worked fine on 1.8.2 I rolled back to it to fix initially.

@jacobalberty
Copy link
Author

Master looks good, so far after 30 minutes of testing no issues. It was previously leaking goroutines within 5 minutes of launch.

@nhooyr
Copy link
Contributor

nhooyr commented Feb 27, 2020

Awesome :)

Will tag a release tonight/tomorrow 🎊

@nhooyr
Copy link
Contributor

nhooyr commented Feb 28, 2020

Published v1.8.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants