Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable W^X in Rosetta emulated x64 containers on macOS #102509

Merged
merged 4 commits into from
May 23, 2024

Conversation

janvorli
Copy link
Member

The docker on macOS Arm64 uses Rosetta to run x64 containers. That has an effect on the double mapping. The Rosetta is unable to detect when an already executed code page is modified. So we cannot use double mapping on those containers. To detect that case, this change adds check that verifies that the double mapping works even when the code is modified.

Close #102226

The docker on macOS Arm64 uses Rosetta to run x64 containers. That
has an effect on the double mapping. The Rosetta is unable to detect
when an already executed code page is modified. So we cannot use double
mapping on those containers. To detect that case, this change adds check
that verifies that the double mapping works even when the code is
modified.

Close dotnet#102226
@janvorli janvorli added this to the 9.0.0 milestone May 21, 2024
@janvorli janvorli requested a review from jkotas May 21, 2024 16:37
@janvorli janvorli self-assigned this May 21, 2024
@janvorli
Copy link
Member Author

cc: @noahfalk, @mangod9

@@ -47,6 +47,90 @@ static const off_t MaxDoubleMappedSize = UINT_MAX;

#endif // TARGET_OSX

#if defined(TARGET_AMD64) && !defined(TARGET_OSX)
Copy link
Member

@am11 am11 May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any technical issue that require us to restrict this run-time introspection for these platforms? If we don't restrict it, then it will cover qemu cases for other archs as well (e.g. #97729 (comment)), and prevent user from manually setting DOTNET_EnableWriteXorExecute=0 to run .NET applications.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually not convinced whether such detection is required? Perhaps we can just document that the config needs to be disabled for emulated cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember once @jkotas mentioned that we wanted to support rosetta2 on macOS as a first-class platform, and therefore we are handling it explicitly with sysctl.proc_translated in a few places (coreclr, nativelibs and shell scripts and even in official build infra for downloading x64 on arm64 in translated process). I think it might be better to bring this to IsProcessTranslated plan as well without the run-time detection?

bool IsProcessTranslated()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sysctl.proc_translated is not going to help. This is about running Linux docker containers. We do not know about sysctl.proc_translated equivalent for Linux docker containers running under Rosetta.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, it makes sense. Could we enable this feature detection for other platforms as well (which will cover qemu scenarios)? (assuming the overhead due to this one-time check is negligible)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@am11 I've tried to look at the discussion you have linked, but I have not seen any evidence of the culprit being inability to double map. But I might have missed that. Can you please point me to such information? The point is, if it is not a problem of double mapping, this check may not be able to check the other issue.
When I started to investigate this issue, I actually still had a docker desktop that used QEMU for x64 emulation on arm64 and even with W^X disabled, attempt to build .NET runtime was failing at some point due to SIGSEGV. Only after that I've realized I need to upgrade docker to get the version that used Rosetta to be able to make it work even with W^X disabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janvorli, I was running into qemu: uncaught target signal 11 (Segmentation fault) which was fixed with DOTNET_EnableWriteXorExecute=0. Later dotnet publish was failing a runtime assertion on qemu arm target. But to get to that late assertion sigsegv was the roadblock that DOTNET_EnableWriteXorExecute=0 fixed. In the context of this PR, I thought it might be useful to keep the condition a bit relaxed.

pVerificationFunction = (VerificationFunctionPtr)pExecutablePage;
testCallResult = pVerificationFunction();
// Invoke the function via the executable mapping again. It should return 2 now.
// This doesn't work when running x64 code in docker on macOS Arm64 where the code is not re-translated by Rosetta
Copy link
Member

@jkotas jkotas May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether it is possible to make it work by issuing flush instruction cache syscall. Is there a syscall like that on Linux x64?

Similar to what we do for Windows:

// ClrFlushInstructionCache is used when we want to call FlushInstructionCache
// for a specific architecture in the common code, but not for other architectures.
// We call ClrFlushInstructionCache whenever we create or modify code in the heap.
// Currently ClrFlushInstructionCache has no effect on AMD64
//
inline BOOL ClrFlushInstructionCache(LPCVOID pCodeAddr, size_t sizeOfCode, bool hasCodeExecutedBefore = false)
{
if (hasCodeExecutedBefore)
{
FlushInstructionCache(GetCurrentProcess(), pCodeAddr, sizeOfCode);
}
else
{
MemoryBarrier();
}
return TRUE;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apple folks told me in the past that Rosetta doesn't support this scenario, but I can add the cache flushing. In fact, it should be in this testing method anyways.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be specific, the problem is that Rosetta JITs the x64 code into arm64 on first execution and uses the result ever after. Since it doesn't get any notification on the executable code modification via a secondary mapping, it doesn't re-JIT it and keeps using the old code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the cache flushing, there is the __builtin___clear_cache, but it doesn't generate any code on x64. And there is no syscall to do that on Linux.

if (!VerifyDoubleMapping(fd))
{
close(fd);
return false;
Copy link
Member

@jkotas jkotas May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to result in W^X being disabled?

If I am reading the code correctly, it is going to result into startup failure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is going to result in W^X being disabled - I've tested it in a x64 docker container on my M1. See the caller of the VMToOSInterface::CreateDoubleMemoryMapper:

bool ExecutableAllocator::Initialize()
{
LIMITED_METHOD_CONTRACT;
if (IsDoubleMappingEnabled())
{
if (!VMToOSInterface::CreateDoubleMemoryMapper(&m_doubleMemoryMapperHandle, &m_maxExecutableCodeSize))
{
g_isWXorXEnabled = false;
return true;
}
m_CriticalSection = ClrCreateCriticalSection(CrstExecutableAllocatorLock,CrstFlags(CRST_UNSAFE_ANYMODE | CRST_DEBUGGER_THREAD));
}
return true;
}

@janvorli janvorli closed this May 22, 2024
@janvorli janvorli reopened this May 22, 2024
@mrpippy
Copy link

mrpippy commented May 22, 2024

If you just want to detect Rosetta when running on Linux, the best way I know of is to get the "processor brand string" through CPUID, and check for VirtualApple.

@jkotas
Copy link
Member

jkotas commented May 22, 2024

@mrpippy Thank you for the tip! @janvorli Checking CPUID sounds better to me than what's in the PR. What do you think?

@am11
Copy link
Member

am11 commented May 22, 2024

With qemu based linux/amd64 container running on osx-arm64: CPU Vendor: AuthenticAMD
With roestta based linux/amd64 container running on osx-arm64: CPU Vendor: GenuineIntel

test:

# on osx-arm64
# docker run --rm --platform linux/amd64 ubuntu
# apt update && apt install -y gcc

$ cc -xc - <<EOF
#include <stdio.h>
#include <string.h>
#include <cpuid.h>

int main()
{
    unsigned int regs[4];
    char vendor[13];
    char brand[49];

    // Get the vendor string
    __cpuid(0, regs[0], regs[1], regs[2], regs[3]);
    memcpy(vendor, &regs[1], 4);
    memcpy(vendor + 4, &regs[3], 4);
    memcpy(vendor + 8, &regs[2], 4);
    vendor[12] = '\0';

    printf("CPU Vendor: %s\n", vendor);

    // Get the brand string if available
    if (regs[0] >= 0x80000004) {
        __cpuid(0x80000002, regs[0], regs[1], regs[2], regs[3]);
        memcpy(brand, regs, 16);
        __cpuid(0x80000003, regs[0], regs[1], regs[2], regs[3]);
        memcpy(brand + 16, regs, 16);
        __cpuid(0x80000004, regs[0], regs[1], regs[2], regs[3]);
        memcpy(brand + 32, regs, 16);
        brand[48] = '\0';

        printf("CPU Brand: %s\n", brand);
    } else {
        printf("CPU Brand: Not available\n");
    }

    return 0;
}
EOF


$ ./a.out
CPU Vendor: GenuineIntel
CPU Brand: Not available

@am11
Copy link
Member

am11 commented May 22, 2024

A less-appealing ways to detect is:

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>

int main() {
    // Get the process name of PID 1
    pid_t pid = 1;
    char proc_path[256];
    char proc_name[256];
    snprintf(proc_path, sizeof(proc_path), "/proc/%d/cmdline", pid);

    FILE *file = fopen(proc_path, "r");
    if (file == NULL) {
        fprintf(stderr, "Error opening file: %s\n", strerror(errno));
        exit(EXIT_FAILURE);
    }

    if (fgets(proc_name, sizeof(proc_name), file) != NULL) {
        printf("Process name of PID %d: %s\n", pid, proc_name);
    } else {
        fprintf(stderr, "Error reading process name\n");
        exit(EXIT_FAILURE);
    }

    fclose(file);
    return 0;
}

Process name of PID 1: /run/rosetta/rosetta

(we try to sparingly use procfs, but in this case, maybe not bad after all 😅)

@mrpippy
Copy link

mrpippy commented May 22, 2024

With qemu based linux/amd64 container running on osx-arm64: CPU Vendor: AuthenticAMD With roestta based linux/amd64 container running on osx-arm64: CPU Vendor: GenuineIntel

test:

$ ./a.out
CPU Vendor: GenuineIntel
CPU Brand: Not available

That's not correct: regs[0] there is the maximum value for basic CPUID information, even on a real Intel system it only returns 0x20. You have to additionally call __cpuid(0x80000000, regs[0], regs[1], regs[2], regs[3]); to get the maximum value for "extended function" CPUID info. I'm testing on macOS, but with that line added I get:

$ ./a.out
CPU Vendor: GenuineIntel
CPU Brand: VirtualApple @ 2.50GHz

Maybe a function to get the CPU brand can go into minipal/cpufeatures.c?

(BTW, I stumbled across this issue because I work on Wine, and found that .NET 7/8 apps were hanging when run on Wine, on Rosetta/macOS. They work again when DOTNET_EnableWriteXorExecute=0 is used. It's an odd request, but I'd like to add a CPUID check like this to the Windows doublemapping.cpp also, and having the function in minipal would be nice.)

@am11
Copy link
Member

am11 commented May 22, 2024

Ah, thanks for the pointer! Yes having this detection in minipal is a nice idea (assuming we may end up using it in multiple places). e.g.

bool minipal_is_rosetta_based_container(void)
{
    unsigned int regs[4];
    char brand[49];

    // Get the maximum value for extended function CPUID info
    __cpuid(0x80000000, regs[0], regs[1], regs[2], regs[3]);
    if (regs[0] < 0x80000004) {
        return false; // Extended CPUID not supported
    }

    // Retrieve the CPU brand string
    for (unsigned int i = 0x80000002; i <= 0x80000004; ++i) {
        __cpuid(i, regs[0], regs[1], regs[2], regs[3]);
        memcpy(brand + (i - 0x80000002) * sizeof(regs), regs, sizeof(regs));
    }
    brand[sizeof(brand) - 1] = '\0';

    // Check if CPU brand indicates Rosetta emulation
    return (strstr(brand, "VirtualApple") != NULL) ? true : false;
}

Edit, perhaps a more general approach to cover QEMU as well:

bool minipal_detect_emulation(void)
{
    unsigned int regs[4];
    char brand[49];

    // Get the maximum value for extended function CPUID info
    __cpuid(0x80000000, regs[0], regs[1], regs[2], regs[3]);
    if (regs[0] < 0x80000004) {
        return false; // Extended CPUID not supported
    }

    // Retrieve the CPU brand string
    for (unsigned int i = 0x80000002; i <= 0x80000004; ++i) {
        __cpuid(i, regs[0], regs[1], regs[2], regs[3]);
        memcpy(brand + (i - 0x80000002) * sizeof(regs), regs, sizeof(regs));
    }
    brand[sizeof(brand) - 1] = '\0';

    // Check if CPU brand indicates emulation
    return (strstr(brand, "VirtualApple") != NULL || strstr(brand, "QEMU") != NULL) ? true : false;
}

based on:

CPU Vendor: AuthenticAMD
CPU Brand: QEMU TCG CPU version 2.5+

@janvorli
Copy link
Member Author

Checking CPUID sounds better to me than what's in the PR. What do you think?

I agree, I also prefer that. Let me rework the change that way.

@janvorli
Copy link
Member Author

perhaps a more general approach to cover QEMU as well:

@am11 thank you for the suggestion, makes sense to include that as well.

@am11
Copy link
Member

am11 commented May 22, 2024

A bit more broader approach could be this:

#include <stdbool.h>
#include <string.h>
#include <unistd.h>

#ifdef TARGET_AMD64
#include <cpuid.h>
#endif

bool minipal_detect_emulation(void)
{
#ifdef TARGET_AMD64
    // Check for CPU brand indicating emulation
    unsigned int regs[4];
    char brand[49];

    // Get the maximum value for extended function CPUID info
    __cpuid(0x80000000, regs[0], regs[1], regs[2], regs[3]);
    if (regs[0] < 0x80000004)
    {
        return false; // Extended CPUID not supported
    }

    // Retrieve the CPU brand string
    for (unsigned int i = 0x80000002; i <= 0x80000004; ++i)
    {
        __cpuid(i, regs[0], regs[1], regs[2], regs[3]);
        memcpy(brand + (i - 0x80000002) * sizeof(regs), regs, sizeof(regs));
    }
    brand[sizeof(brand) - 1] = '\0';

    // Check if CPU brand indicates emulation
    if (strstr(brand, "VirtualApple") != NULL || strstr(brand, "QEMU") != NULL)
    {
        return true;
    }
#endif

    // Check for process name of PID 1 indicating emulation
    char cmdline[256];
    FILE *cmdline_file = fopen("/proc/1/cmdline", "r");
    if (cmdline_file != NULL)
    {
        fgets(cmdline, sizeof(cmdline), cmdline_file);
        fclose(cmdline_file);

        if (strstr(cmdline, "qemu") != NULL || strstr(cmdline, "rosetta") != NULL)
        {
            return true;
        }
    }

    return false;
}

@mangod9
Copy link
Member

mangod9 commented May 22, 2024

assume we are certain that strstr(brand, "VirtualApple") != NULL would work on localized environments as well right?

@am11
Copy link
Member

am11 commented May 22, 2024

It is conceivable that the name could be localized, but given that it's a brand name it may not be the case. In the last comment, I added a fallback for all platforms to proc/1/cmdline checking for qemu or rosetta, which should cover some corner cases of CPUID. 🤞

@jkotas
Copy link
Member

jkotas commented May 23, 2024

assume we are certain that strstr(brand, "VirtualApple") != NULL would work on localized environments as well right?

The CPU IDs are never localized. We check for CPU IDs in other places.

strstr(cmdline, "qemu")

We know that the runtime is broken on qemu in number of different ways, this won't make it work reliably. If we want to check for qemu, I would rather fail with error that says qemu is not supported so that people stop filling issues about crashes caused by running on qemu.

}
#endif

// Check for process name of PID 1 indicating emulation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we know about cases where this is actually necessary, I do not think we should be doing this check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems it is also incorrect when running an app in docker say on x64 linux and the app has qemu or rosetta in its name or path. Then it would turn the W^X off even though there was no emulation.
I guess you are right and we should remove all checks for qemu from this PR. We can add a qemu check in a separate PR to some different place in the runtime startup if we want to prevent the execution there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janvorli, PID 1 is reserved for init process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you are right and we should remove all checks for qemu from this PR.

AOT apps have no problem running on qemu. There are few things in VM which break and can be fixed given time. I don't think abandoning qemu straight up is the right thing. New architectures development (RV64, LA64) heavily rely on qemu.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janvorli, PID 1 is reserved for init process.

That is not the case under Docker without emulation. The PID 1 is used by the first process you run in the container.

Example:

docker run -it ubuntu:22.04 /bin/bash
ps

shows

  PID TTY          TIME CMD
    1 pts/0    00:00:00 bash
    9 pts/0    00:00:00 ps

Copy link
Member

@am11 am11 May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, at the moment it has only one usage, but its name is general enough that it can be used in other places in the future. If QEMU is broken in other cases which come after manually disabling W^X via environment variable, it will continue to be in that state (use at your own risk) until someone investigate/fix it upstream or add a few workarounds in the coreclr VM where it usually give up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this for a bit and I tend to agree with @am11 to not to block running on QEMU completely. While I can see quite a number of issues reported related to QEMU, the fact that Docker uses QEMU when running containers for non-native architectures makes me think that it would be worth keeping that in the enabled state and give QEMU a chance to fix the problems. For developers that want to test their code on multiple targets, it is a great mean for testing apps on target architectures that they don't have physical devices for.
As for disabling W^X for QEMU, I'd leave it out of the checks until the W^X is the only thing that doesn't work. The problem with W^X kicks in very quickly, so I think that it makes people find out quite soon the culprit if they search our issues. Random crashes sound much worse from the perspective of us investigating the problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be worth keeping that in the enabled state and give QEMU a chance to fix the problems.

We have been in this state for last 10 years. I have not seen any of these reliability problems being fixed in QEMU during that time.

For developers that want to test their code on multiple targets, it is a great mean for testing apps on target architectures that they don't have physical devices for.

I would be fine with having DOTNET_UNRELIABLE_ENABLE_QEMU=1 settings that can be used by folks who want to give it a try.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, having it disabled by default and enabling it using an env var like this sounds like a great idea. When looking though the QEMU related issues, I can see that in many cases people were not even aware that they were running under QEMU. So this way they would know right away and those that did that intentionally would be able to enable it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do that in a separate PR though.

@build-analysis build-analysis bot mentioned this pull request May 23, 2024
Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Copy link

@mrpippy mrpippy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the Windows part! A few requests:


bool minipal_detect_emulation(void)
{
#ifdef HOST_AMD64
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be defined(HOST_AMD64) || defined(HOST_X86)?
On macOS, Wine is able to run 32-bit Windows apps under Rosetta.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! I've added a new commit with that.

@@ -60,6 +61,12 @@ inline void *GetBotMemoryAddress(void)

bool VMToOSInterface::CreateDoubleMemoryMapper(void **pHandle, size_t *pMaxExecutableCodeSize)
{
if (minipal_detect_emulation())
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment here to explain that Rosetta on Windows would be on Wine/macOS?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the comment

This will help WINE running 32 bit code under rosetta emulation on
macOS.
@janvorli
Copy link
Member Author

@mrpippy many thanks for suggesting the CPUID way of checking for Rosetta!

@janvorli janvorli merged commit 097c8ab into dotnet:main May 23, 2024
159 checks passed
@janvorli janvorli deleted the disable-wxorx-in-macos-docker-rosetta branch May 23, 2024 21:24
Ruihan-Yin pushed a commit to Ruihan-Yin/runtime that referenced this pull request May 30, 2024
* Disable W^X on x64 in rosetta based container

The docker on macOS Arm64 uses Rosetta to run x64 containers. That
has an effect on the double mapping. The Rosetta is unable to detect
when an already executed code page is modified. So we cannot use double
mapping on those containers. To detect that case, this change adds check
that verifies that the double mapping works even when the code is
modified.

Close dotnet#102226

* Rework based on PR feedback

* Check only for Rosetta

* Enable the rosetta check for x86 too

This will help WINE running 32 bit code under rosetta emulation on
macOS.
@github-actions github-actions bot locked and limited conversation to collaborators Jun 23, 2024
@janvorli
Copy link
Member Author

/backport to release/8.0-staging

@github-actions github-actions bot unlocked this conversation Jul 18, 2024
Copy link
Contributor

Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/9998951901

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash on Rosetta for double mapping jitted code
5 participants