-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable W^X in Rosetta emulated x64 containers on macOS #102509
Disable W^X in Rosetta emulated x64 containers on macOS #102509
Conversation
The docker on macOS Arm64 uses Rosetta to run x64 containers. That has an effect on the double mapping. The Rosetta is unable to detect when an already executed code page is modified. So we cannot use double mapping on those containers. To detect that case, this change adds check that verifies that the double mapping works even when the code is modified. Close dotnet#102226
@@ -47,6 +47,90 @@ static const off_t MaxDoubleMappedSize = UINT_MAX; | |||
|
|||
#endif // TARGET_OSX | |||
|
|||
#if defined(TARGET_AMD64) && !defined(TARGET_OSX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any technical issue that require us to restrict this run-time introspection for these platforms? If we don't restrict it, then it will cover qemu cases for other archs as well (e.g. #97729 (comment)), and prevent user from manually setting DOTNET_EnableWriteXorExecute=0
to run .NET applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually not convinced whether such detection is required? Perhaps we can just document that the config needs to be disabled for emulated cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember once @jkotas mentioned that we wanted to support rosetta2 on macOS as a first-class platform, and therefore we are handling it explicitly with sysctl.proc_translated
in a few places (coreclr, nativelibs and shell scripts and even in official build infra for downloading x64 on arm64 in translated process). I think it might be better to bring this to IsProcessTranslated
plan as well without the run-time detection?
bool IsProcessTranslated() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sysctl.proc_translated
is not going to help. This is about running Linux docker containers. We do not know about sysctl.proc_translated
equivalent for Linux docker containers running under Rosetta.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, it makes sense. Could we enable this feature detection for other platforms as well (which will cover qemu scenarios)? (assuming the overhead due to this one-time check is negligible)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@am11 I've tried to look at the discussion you have linked, but I have not seen any evidence of the culprit being inability to double map. But I might have missed that. Can you please point me to such information? The point is, if it is not a problem of double mapping, this check may not be able to check the other issue.
When I started to investigate this issue, I actually still had a docker desktop that used QEMU for x64 emulation on arm64 and even with W^X disabled, attempt to build .NET runtime was failing at some point due to SIGSEGV. Only after that I've realized I need to upgrade docker to get the version that used Rosetta to be able to make it work even with W^X disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janvorli, I was running into qemu: uncaught target signal 11 (Segmentation fault)
which was fixed with DOTNET_EnableWriteXorExecute=0
. Later dotnet publish was failing a runtime assertion on qemu arm target. But to get to that late assertion sigsegv was the roadblock that DOTNET_EnableWriteXorExecute=0
fixed. In the context of this PR, I thought it might be useful to keep the condition a bit relaxed.
pVerificationFunction = (VerificationFunctionPtr)pExecutablePage; | ||
testCallResult = pVerificationFunction(); | ||
// Invoke the function via the executable mapping again. It should return 2 now. | ||
// This doesn't work when running x64 code in docker on macOS Arm64 where the code is not re-translated by Rosetta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering whether it is possible to make it work by issuing flush instruction cache syscall. Is there a syscall like that on Linux x64?
Similar to what we do for Windows:
runtime/src/coreclr/vm/amd64/cgencpu.h
Lines 588 to 605 in 362a95d
// ClrFlushInstructionCache is used when we want to call FlushInstructionCache | |
// for a specific architecture in the common code, but not for other architectures. | |
// We call ClrFlushInstructionCache whenever we create or modify code in the heap. | |
// Currently ClrFlushInstructionCache has no effect on AMD64 | |
// | |
inline BOOL ClrFlushInstructionCache(LPCVOID pCodeAddr, size_t sizeOfCode, bool hasCodeExecutedBefore = false) | |
{ | |
if (hasCodeExecutedBefore) | |
{ | |
FlushInstructionCache(GetCurrentProcess(), pCodeAddr, sizeOfCode); | |
} | |
else | |
{ | |
MemoryBarrier(); | |
} | |
return TRUE; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apple folks told me in the past that Rosetta doesn't support this scenario, but I can add the cache flushing. In fact, it should be in this testing method anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be specific, the problem is that Rosetta JITs the x64 code into arm64 on first execution and uses the result ever after. Since it doesn't get any notification on the executable code modification via a secondary mapping, it doesn't re-JIT it and keeps using the old code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the cache flushing, there is the __builtin___clear_cache
, but it doesn't generate any code on x64. And there is no syscall to do that on Linux.
if (!VerifyDoubleMapping(fd)) | ||
{ | ||
close(fd); | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to result in W^X being disabled?
If I am reading the code correctly, it is going to result into startup failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is going to result in W^X being disabled - I've tested it in a x64 docker container on my M1. See the caller of the VMToOSInterface::CreateDoubleMemoryMapper
:
runtime/src/coreclr/utilcode/executableallocator.cpp
Lines 282 to 298 in 7452305
bool ExecutableAllocator::Initialize() | |
{ | |
LIMITED_METHOD_CONTRACT; | |
if (IsDoubleMappingEnabled()) | |
{ | |
if (!VMToOSInterface::CreateDoubleMemoryMapper(&m_doubleMemoryMapperHandle, &m_maxExecutableCodeSize)) | |
{ | |
g_isWXorXEnabled = false; | |
return true; | |
} | |
m_CriticalSection = ClrCreateCriticalSection(CrstExecutableAllocatorLock,CrstFlags(CRST_UNSAFE_ANYMODE | CRST_DEBUGGER_THREAD)); | |
} | |
return true; | |
} |
If you just want to detect Rosetta when running on Linux, the best way I know of is to get the "processor brand string" through CPUID, and check for |
With qemu based linux/amd64 container running on osx-arm64: test: # on osx-arm64
# docker run --rm --platform linux/amd64 ubuntu
# apt update && apt install -y gcc
$ cc -xc - <<EOF
#include <stdio.h>
#include <string.h>
#include <cpuid.h>
int main()
{
unsigned int regs[4];
char vendor[13];
char brand[49];
// Get the vendor string
__cpuid(0, regs[0], regs[1], regs[2], regs[3]);
memcpy(vendor, ®s[1], 4);
memcpy(vendor + 4, ®s[3], 4);
memcpy(vendor + 8, ®s[2], 4);
vendor[12] = '\0';
printf("CPU Vendor: %s\n", vendor);
// Get the brand string if available
if (regs[0] >= 0x80000004) {
__cpuid(0x80000002, regs[0], regs[1], regs[2], regs[3]);
memcpy(brand, regs, 16);
__cpuid(0x80000003, regs[0], regs[1], regs[2], regs[3]);
memcpy(brand + 16, regs, 16);
__cpuid(0x80000004, regs[0], regs[1], regs[2], regs[3]);
memcpy(brand + 32, regs, 16);
brand[48] = '\0';
printf("CPU Brand: %s\n", brand);
} else {
printf("CPU Brand: Not available\n");
}
return 0;
}
EOF
$ ./a.out
CPU Vendor: GenuineIntel
CPU Brand: Not available |
A less-appealing ways to detect is: #include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
int main() {
// Get the process name of PID 1
pid_t pid = 1;
char proc_path[256];
char proc_name[256];
snprintf(proc_path, sizeof(proc_path), "/proc/%d/cmdline", pid);
FILE *file = fopen(proc_path, "r");
if (file == NULL) {
fprintf(stderr, "Error opening file: %s\n", strerror(errno));
exit(EXIT_FAILURE);
}
if (fgets(proc_name, sizeof(proc_name), file) != NULL) {
printf("Process name of PID %d: %s\n", pid, proc_name);
} else {
fprintf(stderr, "Error reading process name\n");
exit(EXIT_FAILURE);
}
fclose(file);
return 0;
}
(we try to sparingly use procfs, but in this case, maybe not bad after all 😅) |
That's not correct:
Maybe a function to get the CPU brand can go into (BTW, I stumbled across this issue because I work on Wine, and found that .NET 7/8 apps were hanging when run on Wine, on Rosetta/macOS. They work again when |
Ah, thanks for the pointer! Yes having this detection in minipal is a nice idea (assuming we may end up using it in multiple places). e.g. bool minipal_is_rosetta_based_container(void)
{
unsigned int regs[4];
char brand[49];
// Get the maximum value for extended function CPUID info
__cpuid(0x80000000, regs[0], regs[1], regs[2], regs[3]);
if (regs[0] < 0x80000004) {
return false; // Extended CPUID not supported
}
// Retrieve the CPU brand string
for (unsigned int i = 0x80000002; i <= 0x80000004; ++i) {
__cpuid(i, regs[0], regs[1], regs[2], regs[3]);
memcpy(brand + (i - 0x80000002) * sizeof(regs), regs, sizeof(regs));
}
brand[sizeof(brand) - 1] = '\0';
// Check if CPU brand indicates Rosetta emulation
return (strstr(brand, "VirtualApple") != NULL) ? true : false;
} Edit, perhaps a more general approach to cover QEMU as well: bool minipal_detect_emulation(void)
{
unsigned int regs[4];
char brand[49];
// Get the maximum value for extended function CPUID info
__cpuid(0x80000000, regs[0], regs[1], regs[2], regs[3]);
if (regs[0] < 0x80000004) {
return false; // Extended CPUID not supported
}
// Retrieve the CPU brand string
for (unsigned int i = 0x80000002; i <= 0x80000004; ++i) {
__cpuid(i, regs[0], regs[1], regs[2], regs[3]);
memcpy(brand + (i - 0x80000002) * sizeof(regs), regs, sizeof(regs));
}
brand[sizeof(brand) - 1] = '\0';
// Check if CPU brand indicates emulation
return (strstr(brand, "VirtualApple") != NULL || strstr(brand, "QEMU") != NULL) ? true : false;
} based on:
|
I agree, I also prefer that. Let me rework the change that way. |
@am11 thank you for the suggestion, makes sense to include that as well. |
A bit more broader approach could be this: #include <stdbool.h>
#include <string.h>
#include <unistd.h>
#ifdef TARGET_AMD64
#include <cpuid.h>
#endif
bool minipal_detect_emulation(void)
{
#ifdef TARGET_AMD64
// Check for CPU brand indicating emulation
unsigned int regs[4];
char brand[49];
// Get the maximum value for extended function CPUID info
__cpuid(0x80000000, regs[0], regs[1], regs[2], regs[3]);
if (regs[0] < 0x80000004)
{
return false; // Extended CPUID not supported
}
// Retrieve the CPU brand string
for (unsigned int i = 0x80000002; i <= 0x80000004; ++i)
{
__cpuid(i, regs[0], regs[1], regs[2], regs[3]);
memcpy(brand + (i - 0x80000002) * sizeof(regs), regs, sizeof(regs));
}
brand[sizeof(brand) - 1] = '\0';
// Check if CPU brand indicates emulation
if (strstr(brand, "VirtualApple") != NULL || strstr(brand, "QEMU") != NULL)
{
return true;
}
#endif
// Check for process name of PID 1 indicating emulation
char cmdline[256];
FILE *cmdline_file = fopen("/proc/1/cmdline", "r");
if (cmdline_file != NULL)
{
fgets(cmdline, sizeof(cmdline), cmdline_file);
fclose(cmdline_file);
if (strstr(cmdline, "qemu") != NULL || strstr(cmdline, "rosetta") != NULL)
{
return true;
}
}
return false;
} |
assume we are certain that |
It is conceivable that the name could be localized, but given that it's a brand name it may not be the case. In the last comment, I added a fallback for all platforms to proc/1/cmdline checking for qemu or rosetta, which should cover some corner cases of CPUID. 🤞 |
The CPU IDs are never localized. We check for CPU IDs in other places.
We know that the runtime is broken on qemu in number of different ways, this won't make it work reliably. If we want to check for qemu, I would rather fail with error that says qemu is not supported so that people stop filling issues about crashes caused by running on qemu. |
src/native/minipal/cpufeatures.c
Outdated
} | ||
#endif | ||
|
||
// Check for process name of PID 1 indicating emulation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we know about cases where this is actually necessary, I do not think we should be doing this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it is also incorrect when running an app in docker say on x64 linux and the app has qemu or rosetta in its name or path. Then it would turn the W^X off even though there was no emulation.
I guess you are right and we should remove all checks for qemu from this PR. We can add a qemu check in a separate PR to some different place in the runtime startup if we want to prevent the execution there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janvorli, PID 1 is reserved for init process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you are right and we should remove all checks for qemu from this PR.
AOT apps have no problem running on qemu. There are few things in VM which break and can be fixed given time. I don't think abandoning qemu straight up is the right thing. New architectures development (RV64, LA64) heavily rely on qemu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janvorli, PID 1 is reserved for init process.
That is not the case under Docker without emulation. The PID 1 is used by the first process you run in the container.
Example:
docker run -it ubuntu:22.04 /bin/bash
ps
shows
PID TTY TIME CMD
1 pts/0 00:00:00 bash
9 pts/0 00:00:00 ps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, at the moment it has only one usage, but its name is general enough that it can be used in other places in the future. If QEMU is broken in other cases which come after manually disabling W^X via environment variable, it will continue to be in that state (use at your own risk) until someone investigate/fix it upstream or add a few workarounds in the coreclr VM where it usually give up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this for a bit and I tend to agree with @am11 to not to block running on QEMU completely. While I can see quite a number of issues reported related to QEMU, the fact that Docker uses QEMU when running containers for non-native architectures makes me think that it would be worth keeping that in the enabled state and give QEMU a chance to fix the problems. For developers that want to test their code on multiple targets, it is a great mean for testing apps on target architectures that they don't have physical devices for.
As for disabling W^X for QEMU, I'd leave it out of the checks until the W^X is the only thing that doesn't work. The problem with W^X kicks in very quickly, so I think that it makes people find out quite soon the culprit if they search our issues. Random crashes sound much worse from the perspective of us investigating the problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be worth keeping that in the enabled state and give QEMU a chance to fix the problems.
We have been in this state for last 10 years. I have not seen any of these reliability problems being fixed in QEMU during that time.
For developers that want to test their code on multiple targets, it is a great mean for testing apps on target architectures that they don't have physical devices for.
I would be fine with having DOTNET_UNRELIABLE_ENABLE_QEMU=1
settings that can be used by folks who want to give it a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, having it disabled by default and enabling it using an env var like this sounds like a great idea. When looking though the QEMU related issues, I can see that in many cases people were not even aware that they were running under QEMU. So this way they would know right away and those that did that intentionally would be able to enable it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do that in a separate PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the Windows part! A few requests:
src/native/minipal/cpufeatures.c
Outdated
|
||
bool minipal_detect_emulation(void) | ||
{ | ||
#ifdef HOST_AMD64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be defined(HOST_AMD64) || defined(HOST_X86)
?
On macOS, Wine is able to run 32-bit Windows apps under Rosetta.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course! I've added a new commit with that.
@@ -60,6 +61,12 @@ inline void *GetBotMemoryAddress(void) | |||
|
|||
bool VMToOSInterface::CreateDoubleMemoryMapper(void **pHandle, size_t *pMaxExecutableCodeSize) | |||
{ | |||
if (minipal_detect_emulation()) | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a comment here to explain that Rosetta on Windows would be on Wine/macOS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the comment
This will help WINE running 32 bit code under rosetta emulation on macOS.
@mrpippy many thanks for suggesting the CPUID way of checking for Rosetta! |
* Disable W^X on x64 in rosetta based container The docker on macOS Arm64 uses Rosetta to run x64 containers. That has an effect on the double mapping. The Rosetta is unable to detect when an already executed code page is modified. So we cannot use double mapping on those containers. To detect that case, this change adds check that verifies that the double mapping works even when the code is modified. Close dotnet#102226 * Rework based on PR feedback * Check only for Rosetta * Enable the rosetta check for x86 too This will help WINE running 32 bit code under rosetta emulation on macOS.
/backport to release/8.0-staging |
Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/9998951901 |
The docker on macOS Arm64 uses Rosetta to run x64 containers. That has an effect on the double mapping. The Rosetta is unable to detect when an already executed code page is modified. So we cannot use double mapping on those containers. To detect that case, this change adds check that verifies that the double mapping works even when the code is modified.
Close #102226