Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server side connection close panic fix #44

Closed
wants to merge 4 commits into from

Conversation

jezek
Copy link

@jezek jezek commented Sep 30, 2018

Rewrite spawning of go routines upon new connection, so we can signal them and wait for them on connection close.

This fixes e.g. a panic in window example after window close:

$ go run examples/create-window/main.go 
Map window 71303169 successful!
Checked Error for mapping window 0x1: BadWindow {NiceName: Window, Sequence: 5, BadValue: 0, MinorOpcode: 0, MajorOpcode: 8}
Event: ReparentNotify {Sequence: 4, Event: 71303169, Window: 71303169, Parent: 11719003, X: 0, Y: 0, OverrideRedirect: false}
Event: ConfigureNotify {Sequence: 4, Event: 71303169, Window: 71303169, AboveSibling: 0, X: 1, Y: 28, Width: 500, Height: 500, BorderWidth: 0, OverrideRedirect: false}
Event: MapNotify {Sequence: 6, Event: 71303169, Window: 71303169, OverrideRedirect: false}
XGB: xgb.go:403: A read error is unrecoverable: EOF
XGB: xgb.go:403: A read error is unrecoverable: EOF
panic: close of closed channel

goroutine 23 [running]:
github.com/BurntSushi/xgb.(*Conn).Close(...)
	/home/jezek/.go/src/github.com/BurntSushi/xgb/xgb.go:140
github.com/BurntSushi/xgb.(*Conn).readResponses(0xc4200c8000)
	/home/jezek/.go/src/github.com/BurntSushi/xgb/xgb.go:405 +0x25e
created by github.com/BurntSushi/xgb.postNewConn
	/home/jezek/.go/src/github.com/BurntSushi/xgb/xgb.go:133 +0x232
exit status 2

to

$ go run examples/create-window/main.go
Map window 71303169 successful!
Checked Error for mapping window 0x1: BadWindow {NiceName: Window, Sequence: 5, BadValue: 0, MinorOpcode: 0, MajorOpcode: 8}
Event: ReparentNotify {Sequence: 4, Event: 71303169, Window: 71303169, Parent: 11721679, X: 0, Y: 0, OverrideRedirect: false}
Event: ConfigureNotify {Sequence: 4, Event: 71303169, Window: 71303169, AboveSibling: 0, X: 1, Y: 28, Width: 500, Height: 500, BorderWidth: 0, OverrideRedirect: false}
Event: MapNotify {Sequence: 6, Event: 71303169, Window: 71303169, OverrideRedirect: false}
XGB: xgb.go:433: A read error is unrecoverable: EOF
XGB: xgb.go:597: Invalid event/error type: *errors.errorString
Both event and error are nil. Exiting...

@BurntSushi
Copy link
Owner

Thanks for looking into this! I'm a bit hesitant to merge this though. In particular, the create-window example certainly worked at one point. I don't think I have enough confidence in the test suite to catch any new bugs introduced by this change, and I'm not sure I have the maintenance bandwidth to track down and fix them if and when they occur.

@jezek jezek force-pushed the connection_close_fix branch from e072f44 to 942a9d4 Compare October 2, 2018 03:23
Just cosmetic changes to unify self reference variable name
across all *Conn methods + NewConnNet comment fix

This commit makes no changes in logic
@jezek jezek force-pushed the connection_close_fix branch 2 times, most recently from 7dae864 to 238d57f Compare October 2, 2018 17:21
@jezek
Copy link
Author

jezek commented Oct 2, 2018

I made some changes to PR to group changes by type and improve readability for review.

@BurntSushi I have a question. If I provide additional information, why the window-example failed and describe all the changes I have made with explanations, why I made them and why the changes do not break your logic and can not break other projects or introduce new bugs, but even fix some issues (e.g fixes #32) ... is there a possibility you accept this PR?

I don't really care if you accept, I have a fork, that I use & work on/with, but this repo really helped me and I want to contribute as thanks.

@BurntSushi
Copy link
Owner

@jezek I do appreciate the contribution. I'm sorry that the test suite isn't in a place that allows me to be confident enough to merge this PR.

If you could explain the changes and the problem, that would be great. I will then try out your patch as part of my WM as a form of manual quality assurance, and then I think I would be OK merging.

@jezek jezek force-pushed the connection_close_fix branch 2 times, most recently from de0e3c2 to 7ed4845 Compare October 4, 2018 21:38
@jezek jezek force-pushed the connection_close_fix branch 4 times, most recently from 7ddc162 to f2f644f Compare October 25, 2018 16:16
fix double close panic,
fix occaional panic on sudden connection to server close
handle all channel waitings in cookies and requests properly
@jezek jezek force-pushed the connection_close_fix branch from f2f644f to 67a5ab1 Compare October 27, 2018 17:45
@jezek jezek mentioned this pull request Oct 30, 2018
@jezek
Copy link
Author

jezek commented Oct 30, 2018

Sorry, for the long pause. I began to wrote my review and spotted a flaw in my PR, so I ended up fixing it and writing some tests (#45). In following lines I will try to explain the changes and flaws they are fixing.


In Conn struct I removed the closing channel and added doneSend and doneRead channels.

Why? The original meaning of closing channel was to signal the readResponses method that sendRequests is gracefully closing, to finish all unfinished business and than signal back.

The unfinished business part is provided by the noop method, which basically makes a request to X with reply and than waits for the reply. The noop method is invoked only in sendRequests method, so when it is finished, we know, that there will be no other replies from server, cause we made no other requests after noop.

But there is a flaw in the signaling/responding logic. Imagine, the readResponses method is currently handling some response from server (error, event, reply), so it is somewhere after io.ReadFull([...]) on line 402. When you call (*Conn).Close and you signal, that you want to gracefully close through the channel, invoke noop, which writes an request with reply to X and waits for reply on line 360. After the readResponses finishes handling previous request, it continues, the for loop and stumbles upon the closing signal (line 394) and wants to sends response. But will hold forever, cause sendRequests is waiting for cookie reply and not for response reply. This can happen but does not very often. (see tests PR, "close with pending request")

This doesn't mater now, cause the signaling is handled a little bit different. The problem is the readResponses method is most of the time blocking in the io.ReadFull([...]) method. It was a good idea to wait for the noop to finish and than signal closing before another read blocking. But as you see it can backfire. My idea is simple. Wait for the noop to finish, and then just close the (*Conn).conn connection to server. This causes the next io.ReadFull([...]) to result with error and end the readResponses method. To signal, that sendRequests is gracefully closing and don't want to output error, the doneSend channel is closed before the connection to server, which is detected and handled now on line 450. Note: the doneSend channel is closed before connection to server only on graceful closing.


In postNewConn function the c.closing channel is no longer wanted. c.doneSend and c.doneRead don't need to be buffered, the signaling is achieved by closing the channels. c.doneSend is closed only upon (*Conn).sendRequests method exit (line 363) and c.doneRead closes only upon (*Conn).readResponses exit (line 437). This assures no channel close panics.


In (*Conn).Close method, the signaling of graceful close by closing the c.reqChan is replaced by putting a nil request to the channel (line 146). Why? It's annoying to have a panic error on calling (*Conn).Close more times and I think it's a bad habit. Also just to be sure also check, if the (*Conn).sendRequests is running and if not, then there is no need to make an close request, cause it's already closed.


In (*Conn).NewId method a check if c.xidChan is added, to signal that the connection to server is closed, cause this channel is only closed upon (*Conn).generateXids exit, which we want to exit only when the connection to server is closed.


In (*Conn).generateXids method the logic is not touched, just the code is a little bit reshuffled, to not only send next xid struct to the c.xidChan, but also watch for the closing of c.doneSend channel, which signals the exit of (*Conn).sendRequests method. Because the id from xidChan is used only in requests, and it is pointless to keep this method running, when there will be no new requests (since sendRequests is down). I could make an select on both positions where sending to channel occurs (line 245, line 253) and avoid code reshuffling, but I think this way it is more readable (1.prepare id, 2.send/check). (these changes are fixing memory leak of this routine after Conn close)


The changes in (*Conn).generateSeqIds method are of the same nature as in (*Conn).generateXids (except the reshuffling). So it sends the new seqid to the c.seqChan, or exit, if c.doneSend is closed, because (*Conn).newSequenceId method, which is the only reader from c.seqChan, is only used inside (*Conn).sendRequests method, so this generateSeqIds method no longer needs to run, if sendRequests is down. (this change fixes memory leak of this routine after Conn close)


Changes in (*Conn).NewRequest are made, to never block, if connection is closed and are explained via comments. I personally think, that the user should be notified via error, if request ended in error write, but will not fix this now.


In (*Conn).sendRequests it looks, that there are more changes, but the logic is roughly the same.

There are two new defers. The connection close defer (line 362) is for unblocking the readResponses reads and signaling to exit. The channel close defer (line 363) is to signal other running routines, that there will be no request handling more. The close of c.doneSend has to be before, connection close defer, because we are signaling the readResponses method with this, that we are closing gracefully. Note: on any error in the loop, first, the connection is closed an then we wait for readResponses to exit by waiting for c.doneRead close, just to ensure channel is not closed before readResponses exit.

The for loop through incoming request channel c.reqChan is now not only looking for incoming requests, but also for c.doneRead, which signals the closing of (*Conn).readResponses goroutine. If there is no read responses routine and we did not close the connection (cause we are in loop now, so there was no incoming close request), there has to be an read error and we don't longer want such a connection, so exit.

Before changes, the graceful close request was signaled via the c.reqChan closing. Back then, if the channel was closed, all pending requests were processed, then the loop exited and a noop request was filed and waited to be replied to. That ensured, that all previous requests were replied before connection close. Now the graceful close is signaled via nil request. If nil request intercepted (line 368), again the noop request with wait is done (now with error checking) and then method returns (signaling of graceful close is done in defers).

All other changes in loop are error handling changes. Note: in case of unsuccessful write (line 391) we now return before req.seq close, to signal to (*Conn).NewRequest, that there was a write error (which is not handled).


In (*Conn).readResponses there are two new defers too. Like in sendRequests, the closing of c.doneRead (line 437) is to signal other routines, that this routine is finished. The connection close (line 436) is just for sure, I think it is not needed (cause if there is a read error, there has to be a write error too, so the connection is like a closed connection), but it does no harm there.

The checking for closing (line 393) is no more needed, signaling is done in different way.

After first io.ReadFull there was just a graceful closing check added and a return is used instead of continue (line 457). If read returns an error, all next reads will error, so why repeat the loop. By the way this c.Close with continue causes a panic if server suddenly closes (see tests), cause after read returns error, channel is closed, continue is invoked and sometimes the sendRequests routine is 'slow' and there is no close request in c.closing channel, so another read from server is done, returns error, and again calls c.Close, which results in double close panic.

On next io.ReadFull (line 483), just continue was changed to return. Again, if there was an read error, all next reads will error too, so why continue loop? The graceful check is not necessary here, cause if this read errors here, it is certainly not a graceful close.

All other logic is untouched.


To the switch in processEventOrError function, there were 2 more checks added. Both of them just ensure to not log an invalid type error on graceful exit (line 577) or an read error (the read error is allready loged in readResponses). I think, you wanted return the read error to event channel, just forgot to implement, but I did not want to break current logic. So for an read error just return (nil, nil) as before.


All the changes in cookie.go are to ensure that there is no blocking in checks if the Conn is allredy closed.

@jezek jezek force-pushed the connection_close_fix branch from fcb8d7b to 6e911a2 Compare October 8, 2020 14:00
@jezek jezek mentioned this pull request Nov 10, 2020
@jezek jezek mentioned this pull request Jan 5, 2021
@jezek jezek closed this Mar 3, 2021
@jezek jezek deleted the connection_close_fix branch March 3, 2021 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants