Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: use WER for GOTRACEBACK=wer on Windows #57441

Closed
qmuntal opened this issue Dec 22, 2022 · 47 comments
Closed

runtime: use WER for GOTRACEBACK=wer on Windows #57441

qmuntal opened this issue Dec 22, 2022 · 47 comments

Comments

@qmuntal
Copy link
Contributor

qmuntal commented Dec 22, 2022

Edit: proposal shifted to enable WER using GOTRACEBACK=wer. See #57441 (comment) and #57441 (comment) for more context.

WER is a Windows built-in infrastructure provided by Microsoft that helps troubleshooting hardware and software problems. Many systems rely on the core dumps generated by WER for post-mortem analyses. These dumps are collected by a dedicated process running on the host and uploaded to the WER server and\or stored locally (see Collecting User-Mode Dumps).

#20498 added support for WER when setting GOTRACEBACK=crash, but was reverted and replaced by #49471. The reason to revert it was because WER uploads dumps to Microsoft by default, and this bit was not discussed in the original issue. See this comment #49471 (comment) to understand why I'm counterproposing #49471 after implementing it.

Here I'm proposing to revert the revert, enabling WER when GOTRACEBACK=crash is set. The arguments come next, but first the definition of the crash mode:

GOTRACEBACK=crash is like “system” but crashes in an operating system-specific manner instead of exiting. For example, on Unix systems, the crash raises SIGABRT to trigger a core dump.

I have two simple arguments in favor of WER:

  • WER is the Windows specific manner to crash applications. It is the default experience and many production systems rely on WER to do post-mortem analysis, not only from within Microsoft.

  • WER can be used for collecting user-mode dumps without adding any complexity to the Go runtime.

It's true that some users might not want their diagnosis data to be uploaded to Microsoft servers, but my point is that they already have enough knobs to turn it off or limit what's uploaded:

  • GOTRACEBACK=crash is not the default runtime behavior, one have to manually set via environment variable or programmatically via debug.SetTraceback. We can inform about the WER gotchas in the release notes and in the GOTRACEBACK docs.
  • On Windows, GOTRACEBACK=crash does nothing over GOTRACEBACK=system. If this proposal is approved, the only reason to set GOTRACEBACK=crash would be to opt into WER.
  • Privacy-concerned users will most likely opt-out to uploading diagnosis data when setting up windows for the first time, which disables WER.
  • WER can easily be disabled by running Disable-WindowsErrorReporting on PowerShell. One can also disable just some applications using the system registry.

Side note: #20498 is a bug report, not a proposal, so I think it makes sense to create this as a formal proposal.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Dec 22, 2022
@ianlancetaylor ianlancetaylor changed the title runtime: use WER for GOTRACEBACK=crash on Windows proposal: runtime: use WER for GOTRACEBACK=crash on Windows Dec 22, 2022
@gopherbot gopherbot added this to the Proposal milestone Dec 22, 2022
@ianlancetaylor
Copy link
Contributor

CC @zx2c4 @zhizheng052 @bhiggins @alexbrainman @aarzilli @aclements @golang/windows

@zx2c4
Copy link
Contributor

zx2c4 commented Dec 22, 2022

Wasn't this already discussed and rejected in an earlier proposal? The privacy concerns seem real, especially given how hard opting out has become on Windows, and how little trust exists on the user side these days...

@ianlancetaylor
Copy link
Contributor

This is my personal summary: The earlier attempt was removed due to privacy concerns (which you raised on https://go.dev/cl/307372 -- thanks). Then we decided to adopt minidump in #49471. However, the actual implementation of minidump (https://go.dev/cl/458955) turns out to have some problems (see #49471 (comment)). So this proposal suggests that we reconsider WER despite the privacy issues.

That is, I don't think we made a clear decision that we could not live with the privacy issues, because it seemed that minidump was a viable alternative. Now that we have learned more, it's possible that minidump is not a viable alternative. So let's consider whether we can live with the privacy issues.

I don't have an opinion myself.

@bhiggins
Copy link

IMO, it would be nice to have (and simple to implement) an opt-in only means for Go programs on Windows to work with WER. In addition to being opt-in only, there could be a warning in documentation that states that it's possible for WER to be configured to send crash dumps to Microsoft.

@qmuntal
Copy link
Contributor Author

qmuntal commented Dec 23, 2022

The privacy concerns seem real, especially given how hard opting out has become on Windows, and how little trust exists on the user side these days.

If this proposal is approved, WER would still be opt-in, so only users that need it would have it.

Opting out is just one PS command, Disable-WindowsErrorReporting, so I wouldn't say its hard. Additionally, the previous command only disables uploading dumps to the server, one can still use WER to generate local dumps.

@alexbrainman
Copy link
Member

@qmuntal Thank you for creating this proposal. I agree with your sentiment.

I always supported this functionality #49471 (comment) .

And it looks silly that Delve supports Go Windows minidumps, but Go tools do not provide tools to generate these dumps (see #49471 (comment) ).

Alex

@zx2c4
Copy link
Contributor

zx2c4 commented Dec 25, 2022

My process address space can contain private keys. I can't put those in special memory regions because Go's GC. Ordinary users use my software who can't be fiddling around with powershell. Sometimes my software crashes. When this happens, uploading private keys to some error reporting server is a nonstarter. It must not happen. I can't imagine I'm the only one writing software with this situation, and I wouldn't want to choose the dangerous thing for other developers who might not even be aware that their private keys are sensitive - like casual users of TLS/HTTPS.

@aarzilli
Copy link
Contributor

aarzilli commented Dec 25, 2022

My opinion remains as in #49471 (comment) (i.e. in favor of this).

@bhiggins
Copy link

bhiggins commented Dec 25, 2022 via email

@qmuntal
Copy link
Contributor Author

qmuntal commented Dec 25, 2022

Can we go with an explicit opt-in method to crash in a WER-compatible way
(which really just means not swallowing the error)? This could involve a
new GOTRACEBACK=crash-wer option and/or a runtime function.

We could, but I don't see how crash-wer would bring any benefit over crash, considering that the later is not currently implemented on Windows. It would add confusion to an already overloaded knob.

@zx2c4 ordinary users will not disable WER using a PS command, but neitheir will set GOTRACEBACK=crash unless someone guides them. I completly understand your privacy concerns, but I don't get why WER being opt-in is not enough of a mitigation.

Additionally, by default WER does not upload the full virtual memory, just the data segment, which can contain global variables, but not the heap nor the stack.

@iglendd
Copy link

iglendd commented Dec 27, 2022

Thank you @qmuntal again for bring it forward again. I would actually argue that the proposal title is a bit misnomer and least as far as intent of the proposal is concerned. The proposal effectively intends to implement Go documented behavior “crashes in an operating system-specific manner”. No more nor less (never mind that this is on par with other platforms). On Windows it means either WER, custom crash handling, debugger activation, documented and used report entry in Windows Event log and usage of built-in and 3rd party tools. Another word, it is expected behavior for bazillion of applications running on Windows for many decades. Besides, it is a Go-runtime debugging and troubleshooting feature. If one is concerned with some side-effects for whatever reasons, do not activate it.

@danicat
Copy link
Contributor

danicat commented Jan 3, 2023

I got to this topic from #20498 and I've spend the last hour or so trying to catch up with all the back and forths, so I'm sorry if I'm missing the point, but I just wanted to give my 2 cents.

As @qmuntal expressed in their opening statement, the documentation on GOTRACEBACK=crash is very opinionated:

GOTRACEBACK=crash is like “system” but crashes in an operating system-specific manner instead of exiting. For example, on Unix systems, the crash raises SIGABRT to trigger a core dump.

Based on that, IMHO, whether the SO sends or sends not the crash information to Microsoft is out of context from a Go program perspective. Microsoft says in their documentation:

Beginning with Windows Vista, Windows provides crash, non-response, and kernel fault error reporting by default without requiring changes to your application.

Then they further elaborate:

Because Windows automatically reports unhandled exceptions, application should not handle fatal exceptions.

First impression is that the decision to send or not send WER data to MS is in the scope of the system admin, not the developers. It is a bit old, but they document the controls of this feature in this article here.

Also, it is not clear to me if the way runtime is currently treating error is effectively "handling fatal exceptions" from a Windows perspective and that's why we are losing this functionality. If that's the case, GOTRACEBACK=crash should only disable any special treatment we do and let the application... er... crash. :) [By that I mean I wouldn't try to add any side effects like generating different types of dumps other than the default]

Another concern is that we are not generating Event Log ID 1000, as previously mentioned by @iglendd in #49471 (comment). This was the main reason why I got into this rabbit hole.

Finally, I'm more than happy to help with any coding if needed. I'm way too invested in this now :)

@kevpar
Copy link
Contributor

kevpar commented Feb 17, 2023

Wanted to chime in with some additional support for this proposal. I think it is worth considering the value of Windows local crash dump handling with WER, which is somewhat separate from uploading of reports to Microsoft (and the latter can be disabled, as @qmuntal points out).

  • WER allows local crash dumps to be collected and stored in a directory. This is documented at Collecting User-Mode Dumps.
  • WER allows the user to register their own application that is invoked when another application crashes. For instance, you could write a program that collects your own dumps, or hooks up a debugger to the crashing process. This is documented at Enabling Postmortem Debugging.

As mentioned above, the current approach of calling MiniDumpWriteDump from the crashing application can deadlock the process. It can also result in corrupted dumps due to the state of the process changing while the dump is being collected. So I think it is worth supporting a mechanism that uses the system facility for handling crashes.

@rsc
Copy link
Contributor

rsc commented Mar 1, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Mar 8, 2023

GOTRACEBACK=crash does not upload things on any system today. It sounds like we should preserve that property, even if it's not the default privacy profile on Microsoft systems. It's also the case that code might contain debug.SetTraceback("crash") intending core dumps on Unix and then accidentally get reports to Microsoft on Windows.

Is there some way to force the WER code to write the local crash but not upload?

If not, then adding GOTRACEBACK=wer seems OK and very clear. Worth noting that debug.SetTraceback("wer") in a library would be interesting and maybe unexpected, but libraries are unlikely to do that.

@qmuntal
Copy link
Contributor Author

qmuntal commented Mar 8, 2023

Is there some way to force the WER code to write the local crash but not upload?

It is possible with admin permissions either running a PS script or modifying the registry, but AFAIK not possible within a process.

If not, then adding GOTRACEBACK=wer seems OK and very clear. Worth noting that debug.SetTraceback("wer") in a library would be interesting and maybe unexpected, but libraries are unlikely to do that.

I'm not particularly attached to crash, I just wanted to avoid having to set two knobs. GOTRACEBACK=wer looks clean and the intent is 100% clear, so I'm OK with this solution, which ideally should act like crash but enabling WER. Edit: I still think crash is a better fit, but I understand that it can't be used in this context if the not-upload-things-by-default property should be preserved.

@iglendd
Copy link

iglendd commented Mar 8, 2023

Regarding GOTRACEBACK=crash vs GOTRACEBACK=wer I guess whatever works is fine but to me crash is better then wer to be honest. I guess I am with qmuntal on that point.

Regarding privacy and upload concerns I would argue that this is a completely orthogonal issue. Semantically and conceptually we are following exactly the same logic and behavior as on Unix - allowing system to decide what to do. And we would follow Go documentation literally (https://pkg.go.dev/runtime)

GOTRACEBACK=crash is like “system” but crashes in an operating system-specific manner instead of exiting. For example, on Unix systems, the crash raises SIGABRT to trigger a core dump.

In addition, it is not something a Go process would normally do. A user (or rather a developer) would set it up only for troubleshooting just to get their hands on a crash dump to find out a root cause. There are, by the way, more than a few ways to capture crash dump on Windows. In addition to WER one can also use external application like procdump or Dr.Watson, one can attach register post-mortem or debugger (and all of them are based on the existence of unhandled exception). And so, IMHO, the term wer is a bit ambiguous and misleading in this context. I think the term crash is more aptly describes intent,. behavior and semantics for both Go and system side.🙏

@aarzilli
Copy link
Contributor

aarzilli commented Mar 8, 2023

It is possible with admin permissions either running a PS script or modifying the registry, but AFAIK not possible within a process.

What about doing the opposite: is it possible to know if WER will upload anything from within the process?

@alexbrainman
Copy link
Member

is it possible to know if WER will upload anything from within the process?

I believe WER behaviour is controlled by these registry settings

https://learn.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps

(from @qmuntal message at the top #57441 (comment) ).

So we can read the register settings.

Alex

@qmuntal
Copy link
Contributor Author

qmuntal commented Mar 9, 2023

is it possible to know if WER will upload anything from within the process?

I believe WER behaviour is controlled by these registry settings
https://learn.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps

Yep. Specifically, the setting Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\Disabled controls if WER uploading is enabled or not, although there are other settings to control application-specific behavior.

@rsc @aarzilli just making sure we are on the same page: some of your comments go in the line of supporting local crash dumps trying to reuse WER infrastructure, which is a goal of this proposal, but not the only one. The main objective here is to support WER, as it provides other useful features already mentioned by myself and others in this issue, such as upload crash dumps to WER servers and post mortem debugging.

@aarzilli
Copy link
Contributor

aarzilli commented Mar 9, 2023

I found this line persuasive against GOTRACEBACK=crash always using WER:

It's also the case that code might contain debug.SetTraceback("crash") intending core dumps on Unix and then accidentally get reports to Microsoft on Windows.

Maybe it would be better if GOTRACEBACK=crash used WER only if WER was set for local dumps and a new value of GOTRACEBACK=wer used WER unconditionally (what would GOTRACEBACK=wer do on unix?).

@thepudds
Copy link
Contributor

thepudds commented Mar 14, 2023

Hi @iglendd

I want also to add that technically GOTRACEBACK=crash is not safe for non-Windows platforms either. If a host has crash detection tools like RedHat ABRT, IBM Core Dump Handler and number of other tools deployed. If one enables GOTRACEBACL=crash on these hosts, then core dumps for Go applications may end up in other places specific for tools/technologies.

My understanding is that things like RedHat ABRT and IBM Core Dump Handler are not enabled by default in a stock distro, whereas WER is sometimes enabled by default.

I think though the more important consideration is that there is already existing behavior and meaning for these settings for the Go toolchain. In terms of which behavior is mapped to which setting, perhaps the calculus might have been different if this conversation was happening before Go was released or before it was widely adopted, but at this point, there are millions of Go developers across even more machines.

Updating the meaning of GOTRACEBACK=crash on Windows is a material change that might catch some by surprise (including not everyone carefully reads Go release notes).

For example, someone might have previously set that as an env var on their development machine after reading the documentation and deciding multiple years ago that the then current behavior was what they wanted for the Go programs they write, but they might also have WER enabled on their machine for a variety of reasons (e.g., configured by default by the Windows OEM like Dell or Lenovo or whoever, or by corporate IT, or they might not have bothered to opt out for their personal machine because they were not too worried about phoning home crash info for music players and Zoom and whatnot)... but (a) they still might not want WER for the Go programs they write, or perhaps more importantly (b) they might not want the act of upgrading their Go release to cause crash information to be phoned home when it previously wasn’t unless they explicitly opt in to that new behavior.

As Ian mentioned, people do get fairly passionate about the concept of software phoning home. One of the recurring themes in that linked recent discussion was “it’s an important change of behavior, not everyone reads the release notes, it should be opt in”. Of course, it’s not an identical topic, but it illustrates the sensitivity of decisions here.

Also, under the now accepted forward compatibility proposal, after someone does go get github.com/some/repo, in some cases the next go command they run will run a freshly downloaded Go toolchain version, which might be a version they’ve never downloaded before. There is a convenience there, but I think it likely also raises the bar on what types of changes happen in the Go toolchain by default, and probably increases the weight of “not everyone reads the release notes” considerations.

In any event, do you still have specific practical concerns with the proposal at this point (or maybe is it more of a lingering philosophical concern)? You mentioned documentation for example, but I would think any new behavior that comes from this proposal will need to be documented, and I would guess it would be documented with close to the same number of words and in the same places regardless of which setting is used for the new behavior.

@iglendd
Copy link

iglendd commented Mar 14, 2023

Thank you @thepudds

do you still have specific practical concerns with the proposal at this point (or maybe is it more of a lingering philosophical concern)?

This is a very good way to put it. No, I do not have practical concerns. I am actually very happy to have a method to collect Go crash dump on Windows which currently does not even exist. My last few comments are more or less a form of lingering philosophical concern. Regarding documentation, besides describing the new option it may be worth mentioning that in some non-default deployment generated by system core dumps, when GOTRACEBACK=crash is set and a Go application crashed, can be automatically copied by crash tracking software.

@qmuntal
Copy link
Contributor Author

qmuntal commented Mar 14, 2023

It looks like we are converging towards GOTRACEBACK=wer, thanks everyone for the thoughtful discussion. I'll update the issue title accordingly, although the proposal is still not accepted so there is time for rising more opinions.

About what to do with GOTRACEBACK=crash, specially regarding #57441 (comment), we should move that discussion to #49471 to avoid bikesheeding.

@qmuntal qmuntal changed the title proposal: runtime: use WER for GOTRACEBACK=crash on Windows proposal: runtime: use WER for GOTRACEBACK=wer on Windows Mar 14, 2023
@danicat
Copy link
Contributor

danicat commented Mar 14, 2023

Just to register my final thought on this, I'm with @iglendd and others. I still don't think that it's the business of the Go runtime to care about if the OS is uploading something or not to a random server, as that falls within the scope of the SysAdmin to decide. That said, I'm also not prepared to die on this hill and I accept the compromise of GOTRACEBACK=wer as it at least gives us an option to do something, instead of the no-op that we have today.

@rsc
Copy link
Contributor

rsc commented Mar 15, 2023

We can always take additional steps later, but for now it seems like the simplest, most minimal step is to make GOTRACEBACK=wer use WER and GOTRACEBACK=crash keep doing what it does, which is to be the same as GOTRACEBACK=system. We can see how well that works, it will provide WER for people who need WER, and we can fine-tune later if more evidence accumulates.

On non-Windows systems, GOTRACEBACK=wer would be interpreted the same as GOTRACEBACK=somethingunexpected, which is to say it would be ignored.

Do I have that right?

@aclements
Copy link
Member

On non-Windows systems, GOTRACEBACK=wer would be interpreted the same as GOTRACEBACK=somethingunexpected, which is to say it would be ignored.

It turns out GOTRACEBACK=somethingunexpected has the effect of suppressing all tracebacks, but the code that decides that is so labyrinthine that I don't think that behavior is intentional. It certainly doesn't seem like the right behavior to me.

@qmuntal
Copy link
Contributor Author

qmuntal commented Mar 15, 2023

Do I have that right?

Yep.

It turns out GOTRACEBACK=somethingunexpected has the effect of suppressing all tracebacks, but the code that decides that is so labyrinthine that I don't think that behavior is intentional. It certainly doesn't seem like the right behavior to me.

I tried to understand how the effective traceback level is selected, and I was also stunned with its complexity.

@rsc
Copy link
Contributor

rsc commented Mar 29, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Apr 6, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: runtime: use WER for GOTRACEBACK=wer on Windows runtime: use WER for GOTRACEBACK=wer on Windows Apr 6, 2023
@rsc rsc modified the milestones: Proposal, Backlog Apr 6, 2023
@qmuntal
Copy link
Contributor Author

qmuntal commented Apr 6, 2023

I've updated https://go.dev/cl/474915 so it implements final version of this proposal.

@ChristoWolf
Copy link

To clarify, will this be part of Go 1.21?
And I guess that wer can currently (i.e. Go 1.20) also not be used with debug.SetTraceback, right?

@qmuntal qmuntal modified the milestones: Backlog, Go1.21 Apr 14, 2023
@qmuntal
Copy link
Contributor Author

qmuntal commented Apr 14, 2023

To clarify, will this be part of Go 1.21? And I guess that wer can currently (i.e. Go 1.20) also not be used with debug.SetTraceback, right?

Yep, support for wer in GOTRACEBACK and debug.SetTraceback will be available in the next release, go1.21.

@ChristoWolf
Copy link

ChristoWolf commented Apr 14, 2023

@qmuntal: Wow that was a quick response, thanks!
Sounds great, I am excited :)

Sorry for the probably stupid question, I couldn't fully grasp all the information I found regarding this, but is there currently a way to automatically create mini dumps of crashing Go applications on Windows?

@aarzilli
Copy link
Contributor

You can build go 1.21 from source and use it.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/499515 mentions this issue: doc: document WER and SEH changes in Go 1.21

gopherbot pushed a commit that referenced this issue May 31, 2023
While here, I've removed the CL 472195 TODO, which I marked as
RELNOTE=yes by mistake.

For #57441
For #57302

Change-Id: I7563140bf307f8d732f0154d02b8fa0735527323
Reviewed-on: https://go-review.googlesource.com/c/go/+/499515
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
JonasProgrammer pushed a commit to nhochdrei/go that referenced this issue Feb 20, 2024
GOTRACEBACK=wer is a new traceback level that acts as "crash" and
also enables WER. The same effect can be achieved using
debug.SetTraceback("wer").

The Go runtime currently crashes using exit(2), which bypasses WER
even if it is enabled. To best way to trigger WER is calling
RaiseFailFastException [1] instead, which internally launches the
WER machinery.

This CL also changes how GOTRACEBACK=crash crashes, so both "wer" and
"crash" crash using RaiseFailFastException, which simplifies the
implementation and resolves a longstanding TODO.

Fixes golang#57441
Fixes golang#20498

[1] https://learn.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-raisefailfastexception

Change-Id: I45669d619fbbd2f6413ce5e5f08425ed1d9aeb64
Reviewed-on: https://go-review.googlesource.com/c/go/+/474915
Reviewed-by: Davis Goodin <dagood@microsoft.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
@golang golang locked and limited conversation to collaborators May 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests