Hang after timeout in timeoutWait #24

ebfe · 2014-11-15T21:07:38Z

After the timeout condition in proctl.timeoutWait is triggered, one of the next
next/step/continue commands will hang with dlv blocked in a wait syscall. I
believe the problem is that in the timeout case timeoutWait leaves a goroutine
running that is still executing a wait syscall. Now we have 2 goroutines which
race for the next SIGTRAP and if the abandoned timeoutWait goroutine
wins dlv hangs.

repo:

$ cat sleep.go
package main

import (
    "syscall"
)

func main() {
    ts := syscall.Timespec{ Sec: 3 }
    syscall.Nanosleep(&ts,  nil)
}


$ (echo -e "break syscall.Nanosleep\ncontinue"; yes step) | dlv -run

The text was updated successfully, but these errors were encountered:

ebfe · 2014-11-15T21:27:12Z

Apparently there is a second failure mode where it doesn't hang completly, but every step command takes ~2sec (= timeoutWait timeout).

derekparker · 2014-11-16T02:49:15Z

Yeah, that timeoutWait is a bit of a crutch to handle a thread that can go from procfs reporting a trace stop, then a single step is requested, and that thread resumes a scheduler induced sleep - causing Delve to remain blocked in a wait for that thread.

I'm thinking through some ideas on how to better handle this situation, and that will fix the second failure mode, and should eliminate the race that you're describing. There is a small race in timeoutWait if an actual pid is provided where a SIGSTOP and SIGTRAP are racing and the wait executed by the goroutines gets the SIGTRAP and ignores it.

Also, there is a bug there where if the pid is -1 (to wait on all threads) and the timeout condition is met, a goroutine will be left indefinitely in a blocking wait. That is definitely an issue, but currently no callers of timeoutWait are passing -1 as the pid.

Remove any assumption that a wait syscall on a thread id after a continue will return. Any time we continue a thread, wait for activity from any thread, because the scheduler may well have switched contexts on us due to syscall entrace, channel op, etc... There are several more things to be done here including: * Potential tracking of goroutine id as we jump around to thread contexts. * Potential of selectively choosing threads to operate on based on the internal M data structures, ensuring that our M has an active G. This commit partially fixes #23 and #24, however there are still some random hangs that happen and need to be ironed out.

derekparker · 2014-11-27T02:48:59Z

Can you test and confirm this is no longer an issue. Just pushed some fixes up to master that should resolve this issue, and possibly #23 as well. If these issues seem to be fixed please close out the issue.

ebfe · 2014-11-28T19:15:53Z

Seems fixed as of afa3a9c

* [#175866942] Setup and optus submit worker * [175866942] Send mms to optus, now with EOF * [175866942] Get mms provider from account record * [#175866942] Fixed up the optus_submit tests with correct variable names * [#175866942] Fixed dockerfile * Removed some debug logs

Remove any assumption that a wait syscall on a thread id after a continue will return. Any time we continue a thread, wait for activity from any thread, because the scheduler may well have switched contexts on us due to syscall entrace, channel op, etc... There are several more things to be done here including: * Potential tracking of goroutine id as we jump around to thread contexts. * Potential of selectively choosing threads to operate on based on the internal M data structures, ensuring that our M has an active G. This commit partially fixes go-delve#23 and go-delve#24, however there are still some random hangs that happen and need to be ironed out.

ebfe closed this as completed Nov 28, 2014

ataliadvanstep mentioned this issue Jun 28, 2017

Unable to debug with VSCode or Intellij gogland #902

Closed

MariappanBalraj mentioned this issue Apr 3, 2023

When assert is called from C code, dlv is not able to decode the core dump. #3322

Open

yardenlaif mentioned this issue Aug 10, 2023

Go1.21 incorrect int values #3459

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hang after timeout in timeoutWait #24

Hang after timeout in timeoutWait #24

ebfe commented Nov 15, 2014

ebfe commented Nov 15, 2014

derekparker commented Nov 16, 2014

derekparker commented Nov 27, 2014

ebfe commented Nov 28, 2014

Hang after timeout in timeoutWait #24

Hang after timeout in timeoutWait #24

Comments

ebfe commented Nov 15, 2014

ebfe commented Nov 15, 2014

derekparker commented Nov 16, 2014

derekparker commented Nov 27, 2014

ebfe commented Nov 28, 2014