Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime: provide a way to format output in unhandled panics #63455

Closed
mitar opened this issue Oct 9, 2023 · 17 comments
Closed

proposal: runtime: provide a way to format output in unhandled panics #63455

mitar opened this issue Oct 9, 2023 · 17 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Proposal
Milestone

Comments

@mitar
Copy link
Contributor

mitar commented Oct 9, 2023

Background

When code calls panic(err) and panic is not handled (recovered) it aborts the whole program while printing the panic value and goroutine stack traces. If the panic value implements the error interface, code currently calls its Error method and prints it out.

Proposal

I propose that the code there first checks for the following interface:

type panicError interface {
    error
    PanicError() string
}

And if the value implements this interface, PanicError is called instead of Error to obtain the string to print out.

Discussion

I am the author of the gitlab.com/tozd/go/errors, a modern drop-in replacement for github.com/pkg/errors which enables errors to be annotated with stack traces and structured details. I noticed that it is pretty common for people to pass error values to panic. But the issue is that errors with additional optional information do not have any way to output that information when unhanded panic occurs. For example, errors contain a stack trace where they happened but that stack trace is not printed out, nor is available among stack traces printed out by runtime itself (the error might have happened at a completely different place than where panic is called). Errors do print that additional information when using fmt.Printf, but that is not used by the runtime.

So I suggest that runtime checks for the above interface first, and if it exists, calls it, giving errors an option to provide additional useful information for debugging. The cost of doing this should be minimal (unhandled panics are rare, it is one extra type case in the existing type switch).

Alternatives

Errors could add all extra information to output from Error() but generally none of libraries I have seen is doing that. The reason is probably simple: such strings with extra information are hard to combine while Error() strings generally stay oneliners (except when using errors.Join).

@mitar mitar added the Proposal label Oct 9, 2023
@gopherbot gopherbot added this to the Proposal milestone Oct 9, 2023
@seankhliao seankhliao added the error-handling Language & library change proposals that are about error handling. label Oct 9, 2023
@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Oct 9, 2023
@mitar
Copy link
Contributor Author

mitar commented Mar 6, 2024

@ianlancetaylor How could this proposal be moved into more active consideration? I think it is relatively simple, I am willing to implement it (if accepted), but having it sooner than later could allow the ecosystem to start using it.

@ianlancetaylor
Copy link
Contributor

CC @golang/runtime

@ianlancetaylor ianlancetaylor added compiler/runtime Issues related to the Go compiler and/or runtime. and removed error-handling Language & library change proposals that are about error handling. labels Mar 7, 2024
@rsc
Copy link
Contributor

rsc commented Mar 13, 2024

What is a situation where PanicError would want to print more information than Error?
Or conversely what information must Error not print that PanicError would print?
Specifically, why is it important to print the stack trace during a program-ending panic but important NOT to print it when logging the error to disk and continuing?

@ianlancetaylor
Copy link
Contributor

Another possible approach to this would be to add a function to the runtime.

// Panicking reports whether the program is crashing due to an unhandle panic.
func Panicking() bool

@prattmic
Copy link
Member

Another possible approach to this would be to add a function to the runtime.

// Panicking reports whether the program is crashing due to an unhandle panic.
func Panicking() bool

I assume that the idea here is that Error methods would check runtime.Panicking() and modify their output?

One downside to this approach is that (currently, at least), the runtime calls the Error method prior to freezing the scheduler (in case those methods depend on scheduling, I suppose), so the modified "panicking" output could leak into logs, etc on non-panicking goroutines, which could get confusing.

(Unless runtime.Panicking() is goroutine-local, I suppose? But that is a whole other can of worms.)

@mitar
Copy link
Contributor Author

mitar commented Mar 13, 2024

What is a situation where PanicError would want to print more information than Error?
Or conversely what information must Error not print that PanicError would print?

So in general I would say that PanicError should print all information available in the error, to maximize debugging potential. That can include a stack trace (where the error happened, not where panic was called), but it can also include various key/value/fields data attached to the error (the errors package I maintain supports for example attaching arbitrary details, and some other packages as well). None of those packages output that information using Error(), generally Error() is single-line and all information above adds additional lines. How packages do that is that they support custom printf formatting flags to enable additional printing. But panic handling does not call that, it calls only Error() or String() and this is reasonable - you do not necessary want to call into printf formatting during panic, nor there is a standard fmt format to really extract all information out.

So I think keeping Error() output de-facto standard single-lines is something packages would not like to break just to handle the case that it might be printed during a runtime panic. Also, having Error() be multi-line makes it hard to combine errors with Errorf and %w - I think this is also another reason why everyone is sticking with single-line Error() output.

Specifically, why is it important to print the stack trace during a program-ending panic but important NOT to print it when logging the error to disk and continuing?

For the latter case the developer is in control and they can use fmt.Printf("%+v", err) or something to get this information out. During runtime panic developer does not have control.

I really think that PanicError is a simple and clear API for error to implement. We could also make it PanicString or RuntimePanicString to not suggest that only errors should be passed to panic, but it is really something which returns all info to output during a runtime panic.

@rsc
Copy link
Contributor

rsc commented Mar 15, 2024

I really think that PanicError is a simple and clear API for error to implement.

It may seem simple, but then it raises the question of best practices for which errors implement PanicError versus not, what extra information you print during panics, whether it is acceptable for logging packages and the like to call PanicError to get extra information even for errors that it doesn't know came from panics, and so on. Any time we change a fundamental interface, that's a major shift in the ecosystem that everyone writing Go programs has to absorb.

So far I don't think I have seen the evidence that this problem is pressing enough to be worth all this conceptual cost. Small changes can have very large consequences.

Also more concretely, if (1) PanicError is used for panics that go all the way to the top level of a goroutine and cause a program crash and (2) the example of what PanicError would do is print a stack trace of the error, then: the panic crash is already printing a stack trace. Why do we need to print a second one too?

@rsc
Copy link
Contributor

rsc commented Mar 15, 2024

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals Mar 15, 2024
@mitar
Copy link
Contributor Author

mitar commented Mar 15, 2024

It may seem simple, but then it raises the question of best practices for which errors implement PanicError versus not, what extra information you print during panics, whether it is acceptable for logging packages and the like to call PanicError to get extra information even for errors that it doesn't know came from panics, and so on. Any time we change a fundamental interface, that's a major shift in the ecosystem that everyone writing Go programs has to absorb.

You are right. Maybe a better interface could be a FullError() string which returns any additional data it has and is expected to return a multi-line string? Then also other callers might call it and find it useful.

I do find runtime.Panicking() an interesting idea as well, if it could be implemented easily. In a way, Error() today could already do that, inspect the stack trace to see if it is called from the runtime directly. An ugly workaround though in my mind.

Also more concretely, if (1) PanicError is used for panics that go all the way to the top level of a goroutine and cause a program crash and (2) the example of what PanicError would do is print a stack trace of the error, then: the panic crash is already printing a stack trace. Why do we need to print a second one too?

There can be a difference between a stack trace where the error was made and a stack trace where the error was passed to panic! You can call into a library which returns errors with stack traces and then determine that error at that point is unexpected and pass it to panic.

Also there is much more than just stack traces which can be stored in the error. gitlab.com/tozd/go/errors in fact promotes the idea that error strings should be more or less constant and you should not interpolate user provided or dynamic values into error strings, but that they should be attached as additional structured details on the error. This also allows one to add more such structured details as the error is bubbling up. Such separation makes it easier to log such errors (with structured logging) and in general makes one not have to parse such data out of errors. Moreover, such data might be large and could be unwieldy if tried to be interpolated into the error string. And often data is multi-line or potentially multi-line. Examples of such additional details:

  • file system paths
  • SQL queries
  • paths to where in JSON an error is including the unparsed JSON string where the error failed
  • URLs with query strings
  • username

Those are similar benefits to why structured logging can be simpler to work with than textual logging, especially when you collect logs to some central place and want to automatically analyze them. Such errors with structured details allow then integration with structured logging, find patterns, count occurrences, etc. But with unhandled panic error, this pattern of not putting everything into Error has a downside: data is not visible. Having a way to output it in that case is what I am asking for here.

@rsc
Copy link
Contributor

rsc commented Mar 27, 2024

I still fundamentally don't see why we should privilege the act of crashing the program and make that Error string different from when the panic is recovered but logged for debugging separately. This seems fundamentally wrong as a design matter.

If this were needed in a widespread way (I haven't seen evidence of that yet), a more general approach would be to make the runtime look for a Format method and if so call it as if the panic value were being formatted with %+v.

But it's also already possible for anything doing panic(err) to do panic(percentPlusVFormatting(err)) where that wrapper intercepts the Error method and returns the %+v text. So it's unclear that the runtime should change.

@mitar
Copy link
Contributor Author

mitar commented Mar 27, 2024

I still fundamentally don't see why we should privilege the act of crashing the program and make that Error string different from when the panic is recovered but logged for debugging separately. This seems fundamentally wrong as a design matter.

Because if panic is recovered, you have control how you want to log the error. But for unhandled panic you, by definition, do not have control. So the other place with control is where the error is made and error could be providing additional way to format itself in the case of the unhandled panic.

a more general approach would be to make the runtime look for a Format method and if so call it as if the panic value were being formatted with %+v.

Not really, because there is no standard way to really format errors to get different information it might include. github.com/pkg/errors "standardized" that using %+v formats a stack trace and in my errors package I in fact, for backwards compatibility, cannot include more information with %+v because users expect that it formats (only) a stack trace, in specific format. So for example, you can use %#v to get additional structured details in my package (and %+v for stack trace, for compatibility with github.com/pkg/errors so that my package can be used as maintained drop-in replacement for it).

But it's also already possible for anything doing panic(err) to do panic(percentPlusVFormatting(err)) where that wrapper intercepts the Error method and returns the %+v text. So it's unclear that the runtime should change.

If you control panic call, sure. But if I am writer of a library which uses rich errors with additional information, and final app uses different libraries, some using additional information and some not, then it becomes tricky to handle all of that in percentPlusVFormatting because different errors might expose this additional information in different ways. Also, you have to remember to call percentPlusVFormatting for all panic calls.

I have used this pattern for JSON marshaling and for extended Format for errors. But that is easier because it is clearer what you are doing when you do fmt.Printf("%+v", errors.Formatter{err}) or json.Marshal(errors.Formatter{err})). But panic(percentPlusVFormatting(err)) is less clear because you do not really know if panic will be handled or not.

I mean, this argument could be made also for why does unhandled panic even check for error interface, it could check only stringer and then expect caller to do panic(errorToString(err)). But that would be sad because you would have to do it always. In the same way as now I would have to do panic(percentPlusVFormatting(err)) always, just in case if err happens to be a rich error. (And again, percentPlusVFormatting would have to support all possible error packages instead of being able to delegate unhandled panic formatting to those error packages themselves.)

===

I think interface { PanicString() string } is probably the best approach here. It is general and can be used for any value passed to panic() to handle the case when the panic is unhandled.

Alternative might be interface { PanicUnwrap() any } which would be called on the value passed to panic if the panic is unhandled, and only then existing logic of Error() or String() would be called on this unwrapped value.

@rsc
Copy link
Contributor

rsc commented Apr 3, 2024

This still seems too special case. I don't think we are going to convince you, but I also don't see a compelling case for this change today.

@mitar
Copy link
Contributor Author

mitar commented Apr 3, 2024

Yea, I see that we are not getting across to each other. That is fine, no hard feelings here, just so that you know.

But I do think that it is a necessary feature, unhandled panics are generally rare and you really want to maximize debugging information you can get at that point. So if error carries such information, it is sad that it is not possible to expose that information. There are not much alternatives (using Error() to expose it will break regular uses of Error(), and wrapping an error every time you pass to panic might work if you do control when panic is called, but that is not always the case - caller of panic might not know anything about the super informative error it is getting).

@rsc rsc moved this from Active to Likely Decline in Proposals Apr 10, 2024
@rsc
Copy link
Contributor

rsc commented Apr 10, 2024

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/581215 mentions this issue: runtime: properly frame panic values in tracebacks

@rsc
Copy link
Contributor

rsc commented Apr 24, 2024

No change in consensus, so declined.
— rsc for the proposal review group

@rsc rsc moved this from Likely Decline to Declined in Proposals Apr 24, 2024
@rsc rsc closed this as completed Apr 24, 2024
@mitar
Copy link
Contributor Author

mitar commented May 3, 2024

I think a workaround I can do is to check in Error() implementation the stack trace and see if Error() is called directly from the runtime, and if so, output a more extensive output.

gopherbot pushed a commit that referenced this issue May 8, 2024
This CL causes the printing of panic values to ensure that all
newlines in the output are immediately followed by a tab, so
that there is no way for a maliciously crafted panic value to
fool a program attempting to parse the traceback into thinking
that the panic value is in fact a goroutine stack.

See #64590 (comment)

+ release note

Updates #64590
Updates #63455

Change-Id: I5142acb777383c0c122779d984e73879567dc627
Reviewed-on: https://go-review.googlesource.com/c/go/+/581215
Auto-Submit: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Proposal
Projects
Status: Declined
Development

No branches or pull requests

6 participants