-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hang after timeout in timeoutWait #24
Comments
Apparently there is a second failure mode where it doesn't hang completly, but every step command takes ~2sec (= timeoutWait timeout). |
Yeah, that timeoutWait is a bit of a crutch to handle a thread that can go from procfs reporting a trace stop, then a single step is requested, and that thread resumes a scheduler induced sleep - causing Delve to remain blocked in a wait for that thread. I'm thinking through some ideas on how to better handle this situation, and that will fix the second failure mode, and should eliminate the race that you're describing. There is a small race in timeoutWait if an actual pid is provided where a SIGSTOP and SIGTRAP are racing and the wait executed by the goroutines gets the SIGTRAP and ignores it. Also, there is a bug there where if the pid is -1 (to wait on all threads) and the timeout condition is met, a goroutine will be left indefinitely in a blocking wait. That is definitely an issue, but currently no callers of timeoutWait are passing -1 as the pid. |
Remove any assumption that a wait syscall on a thread id after a continue will return. Any time we continue a thread, wait for activity from any thread, because the scheduler may well have switched contexts on us due to syscall entrace, channel op, etc... There are several more things to be done here including: * Potential tracking of goroutine id as we jump around to thread contexts. * Potential of selectively choosing threads to operate on based on the internal M data structures, ensuring that our M has an active G. This commit partially fixes #23 and #24, however there are still some random hangs that happen and need to be ironed out.
Can you test and confirm this is no longer an issue. Just pushed some fixes up to master that should resolve this issue, and possibly #23 as well. If these issues seem to be fixed please close out the issue. |
Seems fixed as of afa3a9c |
* [#175866942] Setup and optus submit worker * [175866942] Send mms to optus, now with EOF * [175866942] Get mms provider from account record * [#175866942] Fixed up the optus_submit tests with correct variable names * [#175866942] Fixed dockerfile * Removed some debug logs
Remove any assumption that a wait syscall on a thread id after a continue will return. Any time we continue a thread, wait for activity from any thread, because the scheduler may well have switched contexts on us due to syscall entrace, channel op, etc... There are several more things to be done here including: * Potential tracking of goroutine id as we jump around to thread contexts. * Potential of selectively choosing threads to operate on based on the internal M data structures, ensuring that our M has an active G. This commit partially fixes go-delve#23 and go-delve#24, however there are still some random hangs that happen and need to be ironed out.
After the timeout condition in proctl.timeoutWait is triggered, one of the next
next/step/continue commands will hang with dlv blocked in a wait syscall. I
believe the problem is that in the timeout case timeoutWait leaves a goroutine
running that is still executing a wait syscall. Now we have 2 goroutines which
race for the next SIGTRAP and if the abandoned timeoutWait goroutine
wins dlv hangs.
repo:
The text was updated successfully, but these errors were encountered: