runtime/trace: flush trace data on non-throw crashes #65319

mknyszek · 2024-01-26T20:35:40Z

If a Go program has tracing enabled and crashes, chances are that the most recent data (the most useful data) won't be properly flushed, and the trace will be broken. We can discard this broken part of the trace in the tooling (#65316), but it doesn't change the fact that we might loose a lot of information.

The thing is, many crashes that only impact user program state (such as nil dereferences and uncaught-but-recoverable panics) can absolutely still go through with a global buffer flush (runtime.traceAdvance) since the runtime state is still OK.

I'd like to suggest explicitly flushing all trace data on an uncaught panic or a crash due to some "easier" case, like nil dereferences, so that as much of the data comes out in-tact as possible.

The text was updated successfully, but these errors were encountered:

mknyszek · 2024-01-26T20:37:29Z

Note: this is related to #63185 (flight recording) as well, since this could make recovering trace data from a crash while flight recording was enabled much more successful in the future. We could consider also adding the ability to install an optional handler to the flight recorder for writing out trace data in these cases, though that should probably go in the flight recording proposal.

mknyszek · 2024-01-26T20:38:46Z

This also goes hand-in-hand with #65316, since it's still likely the tail end of the trace data will be broken, since the crash still has to happen.

gopherbot · 2024-02-09T22:22:14Z

Change https://go.dev/cl/562616 mentions this issue: runtime: call traceAdvance before exiting

This ensures the trace buffers are as up-to-date as possible right before crashing. It increases the chance of finding the culprit for the crash when looking at core dumps, e.g. if slowness is the cause for the crash (monitor kills process). Fixes golang#65319. Change-Id: Iaf5551911b3b3b01ba65cb8749cf62a411e02d9c Reviewed-on: https://go-review.googlesource.com/c/go/+/562616 Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>

aktau · 2024-10-22T11:58:22Z

I'm wondering if we could do this in more cases. I think I see three top-level crashing functions (all calling startpanic_m):

gopanic(): user panic calls. This one was already handled by https://go.dev/cl/562616
sighandler, which crashes on certain signals (SIGABRT, SIGQUIT, I assume SIGSEGV as well)
throw() and fatal()

Which of these paths could we conceivably add a traceAdvance to? The most interesting case for us would be sighandler(), but that is annotated //go:nowritebarrierrec, which conflicts with traceAdvance. Is there a way to get around this? Or is there a way to create a traceAdvanceLite which does as much as feasible?

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jan 26, 2024

mknyszek added this to the Backlog milestone Jan 26, 2024

cherrymui added the NeedsFix The path to resolution is known, but the work has not been done. label Jan 26, 2024

prattmic added this to Go Compiler / Runtime Jan 31, 2024

mknyszek self-assigned this Jan 31, 2024

mknyszek moved this to Todo in Go Compiler / Runtime Jan 31, 2024

gopherbot closed this as completed in 20f4b6d Feb 10, 2024

github-project-automation bot moved this from Todo to Done in Go Compiler / Runtime Feb 10, 2024

felixge mentioned this issue Feb 25, 2024

testing: writeProfiles is not called after panic #65129

Open

mknyszek removed this from Go Compiler / Runtime Feb 28, 2024

nsrip-dd mentioned this issue May 23, 2024

cmd/trace: failed to read event: expected batch event (EventBatch), got Invalid(115) #67602

Closed

dmitshur modified the milestones: Backlog, Go1.23 May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime/trace: flush trace data on non-throw crashes #65319

runtime/trace: flush trace data on non-throw crashes #65319

mknyszek commented Jan 26, 2024

mknyszek commented Jan 26, 2024

mknyszek commented Jan 26, 2024

gopherbot commented Feb 9, 2024

aktau commented Oct 22, 2024 •

edited

Loading

runtime/trace: flush trace data on non-throw crashes #65319

runtime/trace: flush trace data on non-throw crashes #65319

Comments

mknyszek commented Jan 26, 2024

mknyszek commented Jan 26, 2024

mknyszek commented Jan 26, 2024

gopherbot commented Feb 9, 2024

aktau commented Oct 22, 2024 • edited Loading

aktau commented Oct 22, 2024 •

edited

Loading