-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: stalls under Rosetta 2 #42700
Comments
Try |
Indeed, with |
@cherrymui, could you share a gist of any Rosetta error messages you've seen? I'm the main developer of Rosetta 2, and I would be interested in seeing them. |
@zwarich thanks for reaching out! One of the internal error is
I'll see if I can find out others. Thanks. |
Here is another one
This program includes sending itself a SIGSEGV using kill. Others more likely materialize as program failures, e.g. hanging, or memory corruption. I'll see if I can reproduce them with C code. Thanks. |
@cherrymui, is the program that sends itself a SIGSEGV publicly available? |
Yes, all programs I mentioned are publicly available. They are mostly Go programs. If you're okay with building and running Go programs, I can attach instructions here. Or if you prefer, I can attach binaries. I'm still working on C producers. |
@cherrymui, instructions would be greatly appreciated. |
Actually, I think we can now reproduce all of the issues mentioned here. |
@zwarich that's great! I'm about to send you the instruction for running Go programs. So I'll do it anyway.
This is the program that sends SIGSEGV to itself. This program is expected to crash due to SIGSEGV with a stack dump (but not emulator crash :). This does not always fail. In fact, it fails in a fairly low rate. I also see another error for this program:
This is somewhat more likely to happen that the one mentioned previously. I managed to reproduce the last one with a C program:
This program, running under Rosetta 2, is likely to emit the error mentioned above, or hang. The call injection part is a bit hacky under C ABI, but it works okay given what Thanks! |
Seeing something similar as well:
|
Is there a way to change |
No. This is a temporary workaround and not a long-term solution. (You can re-exec the process with the environment variable set, if that counts.) |
Thanks for this suggestion. I needed a quick workaround and this fit the bill. The simplest way I've found to check if a process is running under Rosetta 2: if v, _ := syscall.SysctlUint32("sysctl.proc_translated"); v == 1 {
// Rosetta 2
} |
Can you see if this is still happening in beta build 20C5061b, released today? |
I updated the DTK to 20C5061b. I tried the the C reproducer above and the Go program that sends itself SIGSEGV, and they no longer fail. And I didn't see the assertion failures I saw before. Thanks! However, when running Go build and testsuite I got a different assertion failure:
Also, I still see hanging processes occasionally. I sent a SIGQUIT to a hanging process (which typically triggers a stack dump for Go), and I got
And
(These are separate cases.) Thanks. |
Agreed, different but potentially related. Investigating, thanks! |
I see hanging processes, and often with a |
I observe sporadically in a golang program now running on an M1 Air: assertion failed [*ret_index + 2 == num_insts]: RET should be second last instruction in translation Subsequently, the program enters a loop, with process CPU at 100%, requiring kill to terminate. Setting GODEBUG=asyncpreemptoff=1 appears to address this, and I do not see the assertion failure after several days of running program (it is a server). In addition to running as a HTTP server, the program also issues HTTP calls to another server. I do not know if net connections are subject to SIGURG signals. Also, the program relies on kqueue/kevents for notifications about process changes, as well as an FSEvents callback for file system updates and the local OSLogStore for tracking logging events. Perhaps the async preemption of go routines is confusing network activity or some of the host C API calls. Perhaps some of the native layers are already ARM, and some transition to/from translation mode is causing an issue? No issues are seen on an Intel Macbook, where this program has been installed and run regularly for several years. Perhaps with ARM support in Go 1.16, Rosetta will not be required? Thank you. |
@directionless @zosmac That issue will be fixed in an upcoming version of Rosetta. I will post here when a beta or release is available. |
@zosmac yes. You can try that with Go tip (the master branch). |
Thank you both. And perhaps another useful datapoint: I run my golang program using sudo to give it access to some root owned resources. I added a fcntl(0, F_GETOWN) to see where SIGURG handling was assigned and discovered the handler is assigned to the process group established by sudo. I redefined my program as root owned and enabled setuid so it could run standalone. It has been running now for several hours without encountering the assertion failure. I did not set GODEBUG=asyncpreemptoff=1. I will continue investigating go with arm support. |
@zosmac I managed to bootstrap an arm version using the above workarounds with rosetta. It's nothing special, see my notes: https://gist.github.com/cfstras/c7e2d537114eb3f640b4c4c3cd5c0809#go-golang |
@cfstras I looked at your link above. After you get
The bootstrap toolchain is, although mostly functional, special purposed for bootstrapping. It is not as complete as full build. |
I updated the DTK to 11.1 20C69, and I still see hanging processes, and still see rosetta errors when sending SIGQUIT to hanging processes
and
|
@zwarich I also had the below two error messages with Java: rosetta error: no code fragment associated with the given arm pc assertion failed: unexpectedly received EXC_BAD_INSTRUCTION Tried different JDK versions, but the issue persists. I'm using M1 MacBook Pro. The same program runs just fine on Windows. |
There are additional issues under investigation beyond the original issue reported, which should be resolved in macOS 11.1. This particular issue has evolved into various different problems and errors. It is helpful if issues are treated independently with their own Feedback, as their root cause and fixes may be different. #1
#2
#3
#4
#5
#6
#7 Issues #1-#5 should be verified on 20C69 (macOS 11.1) Issues #6 and #7 you should confirm if they are still happening in beta build 20D5029f, available today. |
I believe that 6) should be fixed in 20D5029f. Additionally, 7) is mostly fixed, but still occurs very rarely, which impedes debugging. It is a fairly generic assertion, which can have many causes. If you find a way to reproduce 7) where the elapsed time is measured in minutes rather than hours, please let me know. |
Having updated to 20D5029f, I'm able to run all.bash under Rosetta 10 times in a row without hanging or assertion failure. Thank you for the fix! I'll let you know if I can reproduce 7. |
This should be fixed in macOS 11.2. Thanks @zwarich , @Developer-Ecosystem-Engineering ! Closing. |
What version of Go are you using (
go version
)?What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
While trying to run
make.bash
under Rosetta 2, I noticed high-CPU endless stalls in link, asm and compile. WithoutGOMAXPROCS=1
,make.bash
never gets to finish; with it, it works more often than not, but not always.Once, I got this printed to my terminal during a stall
I have no reason to think this is specific to
make.bash
, they are probably just the longest-running things I ran so far./cc @cherrymui
The text was updated successfully, but these errors were encountered: