Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/cgo: longjmp in CGO code causes a crash on Windows #9754

Closed
trink opened this issue Feb 2, 2015 · 21 comments
Closed

runtime/cgo: longjmp in CGO code causes a crash on Windows #9754

trink opened this issue Feb 2, 2015 · 21 comments
Milestone

Comments

@trink
Copy link

trink commented Feb 2, 2015

System Info

Go: 1.4.1
OS: Windows 7 64bit
Compiler gcc (tdm64-1) 4.7.1

This problem is not new to 1.4 but changes to the libraries being linked can cause the problem to appear/disappear.

Steps to reproduce

 git clone https://github.com/mozilla-services/heka
 cd heka
 git checkout --track origin/sandbox_refactor
 build.bat
 ctest -V -R sandbox

Output

2/3 Test #24: sandbox ..........................***Failed    0.84 sec
2015/02/02 07:15:08 payload_type: txt
payload_name:
payload: Hello World!
2015/02/02 07:15:08 payload_type: txt
payload_name:
payload: Hello World!
2015/02/02 07:15:08 payload_type: txt
payload_name:
payload: Hello World!
Exception 0xc0000005 0x1 0x3d1a0c 0x778e5bce
PC=0x778e5bce
signal arrived during cgo execution

github.com/mozilla-services/heka/sandbox/lua._Cfunc_process_message(0x2e6cc0, 0xc000000000)
        github.com/mozilla-services/heka/sandbox/lua/_obj/_cgo_gotypes.go:96 +0x4a
github.com/mozilla-services/heka/sandbox/lua.(*LuaSandbox).ProcessMessage(0xc08200c780, 0xc08200c500, 0x27)
        D:/work/heka/build/heka/src/github.com/mozilla-services/heka/sandbox/lua/lua_sandbox.go:618 +0x64
github.com/mozilla-services/heka/sandbox/lua_test.TestAPIErrors(0xc082058240)
        D:/work/heka/build/heka/src/github.com/mozilla-services/heka/sandbox/lua/lua_sandbox_test.go:330 +0x4e0
testing.tRunner(0xc082058240, 0x8ae9d8)
        d:/go/x64/src/testing/testing.go:447 +0xc6
created by testing.RunTests
        d:/go/x64/src/testing/testing.go:555 +0xa92

The failure occurs on line https://github.com/mozilla-services/heka/blob/sandbox_refactor/sandbox/lua/lua_sandbox_interface.c#L170 where luaL_error long jumps.

Observation: Adding a return statement to the line changes the behavior but it did not prove to be a viable workaround.

@ianlancetaylor
Copy link
Contributor

Can you write a standalone test case?

I can imagine several ways that longjmp will fail horribly in a Go program, but without knowing what Lua is doing here I don't know whether it can be expected to work or not.

@mikioh mikioh changed the title longjmp in CGO code causes a crash on Windows runtime/cgo: longjmp in CGO code causes a crash on Windows Feb 2, 2015
@trink
Copy link
Author

trink commented Feb 2, 2015

Sadly, I haven't had any luck. The unit test I provided is now reproducing the problem consistently but even this code was running successfully with a slightly different verison of the lua.dll (some patches were appiled, but there were no modifications to the error handling routines)

@alexbrainman
Copy link
Member

@trink please try https://go-review.googlesource.com/#/c/5711/ see if it helps your problem. I don't have all mingw tools installed to try it myself.

Alex

@trink
Copy link
Author

trink commented Feb 27, 2015

Still seeing the same issue.

go version devel +ad3a99c Tue Feb 24 15:00:34 2015 +1100 windows/amd64

2/3 Test #24: sandbox ..........................***Failed    1.24 sec
2015/02/27 12:58:05 payload_type: txt
payload_name:
payload: Hello World!
2015/02/27 12:58:05 payload_type: txt
payload_name:
payload: Hello World!
2015/02/27 12:58:05 payload_type: txt
payload_name:
payload: Hello World!
Exception 0xc0000005 0x1 0x49eb2c 0x77a35bce
PC=0x77a35bce
signal arrived during cgo execution

github.com/mozilla-services/heka/sandbox/lua._Cfunc_process_message(0x3f6ce0, 0xc000000000)
        github.com/mozilla-services/heka/sandbox/lua/_obj/_cgo_gotypes.go:133 +0x43
github.com/mozilla-services/heka/sandbox/lua.(*LuaSandbox).ProcessMessage(0xc08200caf0, 0xc08200c870, 0x27)
        d:/work/heka/build/heka/src/github.com/mozilla-services/heka/sandbox/lua/lua_sandbox.go:618 +0x64
github.com/mozilla-services/heka/sandbox/lua_test.TestAPIErrors(0xc082056510)
        d:/work/heka/build/heka/src/github.com/mozilla-services/heka/sandbox/lua/lua_sandbox_test.go:330 +0x4e3
testing.tRunner(0xc082056510, 0x904218)
        d:/go/x64/src/testing/testing.go:448 +0xb4
created by testing.RunTests
        d:/go/x64/src/testing/testing.go:556 +0xa53

@alexbrainman
Copy link
Member

Fair enough. Thanks for trying.

Alex

@rsc
Copy link
Contributor

rsc commented Apr 10, 2015

Does it also crash on say Linux or OS X, or just Windows?
If it's just Windows we should be able to do whatever we do to defend against this on those systems.

@rsc rsc added this to the Go1.5Maybe milestone Apr 10, 2015
@rafrombrc
Copy link

Nope, not an issue on OSX or Linux, only Windows.

@alexbrainman
Copy link
Member

@rafrombrc have you tried current tip? @minux made big changes to windows cgo recently.

Alex

@rafrombrc
Copy link

Last try was using code from here. I'll try tip and see i that makes a difference.

@alexbrainman
Copy link
Member

Please try current tip. Thank you.

Alex

@minux
Copy link
Member

minux commented Apr 10, 2015

If the external linking change fixes the problem then it's great.
However, if it's not, there are possibilities that the usage of
longjmp is just incompatible with cgo.

Without looking at the source code, I can't be sure whether the
longjmp usage is inherently incompatible with cgo.

On the plus side, as Lua's longjmp based error handling is portable,
if the problem doesn't exist on Linux, then I think the longjmp usage
should be fine and the external linking work should fix the windows
issue.

Please try again with the current tip. Thanks.
(the external linking code went in on Mar 23.)

@rafrombrc
Copy link

Okay, I've tried the current tip (go version devel +399b3e3 Fri Apr 10 22:51:46 2015 +0000 windows/amd64). Unfortunately, I can't tell whether or not it resolved the original issue, because now I'm hitting a compiler error, it's panicking while trying to compile the executable. I've verified that it's only happening on Windows; on Linux everything compiles and the tests all pass.

selection_001

@minux
Copy link
Member

minux commented Apr 13, 2015

did you pass some custom ldflags?

@rafrombrc
Copy link

Yes, the general LDFLAGS value used was -s, and in certain of our Go files we're using a cgo LDFLAGS value of cgo LDFLAGS: ${LUA_LIB_PATH}/libluasandbox.dll ${LUA_LIB_PATH}/libluasb.dll -lm, where our build system replaces ${LUA_LIB_PATH} with the full filesystem path where the DLLs can be found before the Go compiler is invoked.

@minux
Copy link
Member

minux commented Apr 13, 2015 via email

@rafrombrc
Copy link

Okay, removed the LDFLAGS -s, was able to compile. It seems like things have improved, because the tests get farther than they did before, but they still crash. Sorry for using image pastes, I'm remotely logging into an AWS Windows box and can't figure out how to copy/paste text out of the MinGW command prompt. Anyway, here's the output:

selection_002

Comparing that with what the original output @trink pasted above, you can see that the crash is happening later in the text execution. All of the go_lua_write_message_* output is expected, seeing that is an indication that things are still working, but eventually we still die.

@johnjelinek
Copy link

this seems to have gone quiet ... any updates?

@rjeczalik
Copy link

@johnjelinek From quick glance your CGO code crashes, uninitialised lua_sandbox handle is passed to process_message, which is the reason of said crash. Looking at the test code, especially ignored error here and here seems to confirm the theory. You may want to rewrite the test to not crash, and then eventually figure out the reason why it fails.

[edit] fixed links

@rsc
Copy link
Contributor

rsc commented Jul 21, 2015

It sounds like the longjmp is fixed, and now there is a problem in the test. Is that right?

@ianlancetaylor
Copy link
Contributor

As Russ says, it sounds like this problem is fixed. I'm going to close it. But please do reopen the issue if you can recreate it.

@trink
Copy link
Author

trink commented Jul 27, 2015

Confirmed, the test no longer cores in Go 1.5 Beta 2

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants