Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arm64] System.Threading.Thread.Tests stack overflow #10015

Closed
sdmaclea opened this issue Mar 23, 2018 · 20 comments · Fixed by dotnet/corefx#28613
Closed

[Arm64] System.Threading.Thread.Tests stack overflow #10015

sdmaclea opened this issue Mar 23, 2018 · 20 comments · Fixed by dotnet/corefx#28613
Milestone

Comments

@sdmaclea
Copy link
Contributor

xUnit.net console test runner (64-bit .NET Core)
Copyright (C) 2014 Outercurve Foundation.

Discovering: System.Threading.Thread.Tests
Discovered:  System.Threading.Thread.Tests
Starting:    System.Threading.Thread.Tests
   System.Threading.Threads.Tests.ThreadTests.ApartmentState_NoAttributePresent_DefaultState_Nano [SKIP]
      Condition(s) not met: \"IsWindowsNanoServer\"
Process is terminating due to StackOverflowException.
Command terminated by signal 6
@sdmaclea
Copy link
Contributor Author

The stack overflow occurs during the
System.Threading.Threads.Tests.ThreadTests.ConstructorTest test case.

All other test cases pass.

@sdmaclea
Copy link
Contributor Author

sdmaclea commented Mar 29, 2018

On my linux-arm64 and my linux-x64 Ubuntu 16.04 boxes.

ulimit -a returns stack size (kbytes, -s) 8192

This test tries to allocate 16MB on the stack. Legitimately causing a stack overflow.

This looks like a test bug to me. It is also not clear why this is not failing on linux-x64

@sdmaclea
Copy link
Contributor Author

@kouvel I assert this is a corefx test bug

@kouvel
Copy link
Member

kouvel commented Mar 29, 2018

I suspect pthread_create is not failing when 16 MB of stack size is requested and it only allocates 8 MB. On linux-x64 it seems to actually allocate the requested stack size despite the configured value, and this:

https://linux.die.net/man/3/pthread_create

Under the NPTL threading implementation, if the RLIMIT_STACK soft resource limit at the time the program started has any value other than "unlimited", then it determines the default stack size of new threads. Using pthread_attr_setstacksize(3), the stack size attribute can be explicitly set in the attr argument used to create a thread, in order to obtain a stack size other than the default.

Seems to suggest that the configured value is intended to be the default stack size and not the max, I could be missing something.

@sdmaclea could you please run the following on that machine and let me know the output?

#include <unistd.h>
#include <errno.h>

#include <pthread.h>

#include <iostream>
using namespace std;

void PrintError(int error)
{
    const char *errorName = "?";
    switch(error)
    {
        case ENOMEM:
            errorName = "ENOMEM";
            break;
        case EINVAL:
            errorName = "EINVAL";
            break;
        case EAGAIN:
            errorName = "EAGAIN";
            break;
        case EPERM:
            errorName = "EPERM";
            break;
        default:
            break;
    }
    cout << errorName << " (" << error << ')';
}

bool CheckAndPrintErrorOnNonzero(const char *op, int error)
{
    if(error == 0)
        return true;
    cout << op << " failed: ";
    PrintError(error);
    cout << '\n';
    return false;
}

void *ThreadStart(void *)
{
    return nullptr;
}

int main()
{
    pthread_attr_t attr;
    int error = pthread_attr_init(&attr);
    if(!CheckAndPrintErrorOnNonzero("attr_init", error))
        return 0;

    error = pthread_attr_setstacksize(&attr, (size_t)16 << 20);
    if(!CheckAndPrintErrorOnNonzero("attr_setstacksize", error))
        return 0;

    pthread_t thread;
    error = pthread_create(&thread, &attr, &ThreadStart, nullptr);
    if(!CheckAndPrintErrorOnNonzero("create", error))
        return 0;

    error = pthread_attr_destroy(&attr);
    if(!CheckAndPrintErrorOnNonzero("attr_destroy", error))
        return 0;

    error = pthread_getattr_np(thread, &attr);
    if(!CheckAndPrintErrorOnNonzero("getattr", error))
        return 0;

    size_t stackSize;
    error = pthread_attr_getstacksize(&attr, &stackSize);
    if(!CheckAndPrintErrorOnNonzero("attr_getstacksize", error))
        return 0;

    cout << "Actual stack size: " << stackSize;

    error = pthread_attr_destroy(&attr);
    if(!CheckAndPrintErrorOnNonzero("attr_destroy", error))
        return 0;

    return 0;
}

On linux-x64 I'm seeing that there's no error even for (size_t)6 << 30 (6 GB) and it's reserving the requested size, and I'm seeing a pthread_create error (EAGAIN) when I request 7 GB. Wondering if there's a difference in behavior somewhere here on linux-arm64.

@sdmaclea
Copy link
Contributor Author

I'll try your test.

Also note, while debuigging this I changed the request to 7MB from 16MB and the test passed.

@sdmaclea
Copy link
Contributor Author

Your program as written reports
Actual stack size: 16777216 by default on linux-arm64
Even with a hard limit ulimit -Hs 8192 reports the same.

@sdmaclea
Copy link
Contributor Author

void *ThreadStart(void *) {
  char stackalloc[(16 << 20) - (64 << 10)];

  memset(stackalloc, 0, sizeof(stackalloc));
  return 0;
}

If I do an allocation in the stack, like is done in this test case, I see a SEGV in the debugger. (At least with the hard limit)

@kouvel
Copy link
Member

kouvel commented Mar 29, 2018

Also note, while debuigging this I changed the request to 7MB from 16MB and the test passed.

I guess it only commits the necessary pages, I should have figured. I'll fix.

If I do an allocation in the stack, like is done in this test case, I see a SEGV in the debugger.

Interesting. I'm not sure what we can do about that from the product side. Any ideas? I'll go ahead and fix the test to use at most 2 MB for 32-bit platforms, that should be safe.

@kouvel
Copy link
Member

kouvel commented Mar 29, 2018

I guess it only commits the necessary pages, I should have figured. I'll fix.

Actually that still seems odd, I guess it would skip the guard page and happens to work. It's not typical stack usage anyway, I'll make sure to go through the guard page.

@sdmaclea
Copy link
Contributor Author

This is a 16GB 64 bit platform

@kouvel
Copy link
Member

kouvel commented Mar 29, 2018

I just mean 2 GB should be a safe test for 32-bit and 64-bit

@sdmaclea
Copy link
Contributor Author

It might just be that because this is a 64K page platform arm64 platform, the 16M-64K does not leave enough space for stack address randomization

In gdb, I can see the stack from

print $sp
$2 = (void *) 0xffffb7b9e9d0
(gdb) print *(char*)0xffffb6c8e9d0
$19 = 0 '\000'
(gdb) print *(char*)0xffffb6b8e9d0
Cannot access memory at address 0xffffb6b8e9d0

@janvorli
Copy link
Member

Linux maps only a small portion of the stack space in the beginning and adds mapping more as necessary. But it keeps the virtual range reserved based on the ulimit for the primary thread and based on pthread internal limit or the limit asked for during pthread creation. So going over that limit should result in sigsegv.
There are some more subtle details though. This mechanism expects that the stack grows continuously as SP moves. But if you try to touch address that's smaller the currently mapped stack range lowest address and the SP is larger than the address you touch, at least on amd64, the behavior depends on the kernel version. On older kernels, it still extends the stack. On newer kernels, it rather passes sigsegv to the process, as touching stack below SP is considered illegal (to be precise, there is a 64kB safeguard zone where it still works on amd64, but that seems to be amd64 specific).

@kouvel
Copy link
Member

kouvel commented Mar 29, 2018

I guess something would need to ensure that stack is allocated in continuous fashion. For a local array it is similar to direct addressing so it could potentially skip the guard page and read/write other threads' stacks if stack address randomization is not enabled (or by chance). The compiler may do something for functions with larger stacks (though it doesn't seem to be happening here). alloca may do something as well. In any case I guess it is inherently unsafe and similar would apply to C# stackalloc which has to be in unsafe block anyway.

@janvorli
Copy link
Member

The JIT inserts stack probes for all functions with frame larger than memory page. However, about a week ago or so, the stack probing was reverted back to a way where the SP doesn't move as the probe continues (dotnet/coreclr#17112). It was causing issues on Windows. But it needs to go back for Unix.

@kouvel
Copy link
Member

kouvel commented Mar 29, 2018

I see, makes sense

kouvel referenced this issue in kouvel/corefx Mar 29, 2018
Works around https://github.com/dotnet/coreclr/issues/17170
- Works around an issue on linux-arm64 where pthread_create appears to reserve less stack space than requested without failing, and limits the max reserved stack size to the ulimit-configued value, leading to seg fault when the requested size of stack space is attempted to be used
- Fixed to write every page in the stack region that is being checked, to guarantee hitting the guard page in case of failure. The JIT's stack probe for stackalloc should cover this, https://github.com/dotnet/coreclr/issues/16827 details why that was not happening, in any case this test is not intended to test stack probing behavior, so this change makes the test a bit stronger in what it actually intends to test.
kouvel referenced this issue in kouvel/corefx Mar 29, 2018
Works around and closes https://github.com/dotnet/coreclr/issues/17170
- Works around an issue on linux-arm64 where pthread_create appears to reserve less stack space than requested without failing, and limits the max reserved stack size to the ulimit-configued value, leading to seg fault when the requested size of stack space is attempted to be used
- Fixed to write every page in the stack region that is being checked, to guarantee hitting the guard page in case of failure. The JIT's stack probe for stackalloc should cover this, https://github.com/dotnet/coreclr/issues/16827 details why that was not happening, in any case this test is not intended to test stack probing behavior, so this change makes the test a bit stronger in what it actually intends to test.
@kouvel
Copy link
Member

kouvel commented Mar 29, 2018

The main issue here seems to be that pthread_create allocates less stack space than requested without failing. Filed issue https://sourceware.org/bugzilla/show_bug.cgi?id=23018, not sure if that's the right place but let's see. @sdmaclea it would be great if you could provide any info requested there (probably I guess they will ask for glibc version).

kouvel referenced this issue in dotnet/corefx Apr 1, 2018
Works around and closes https://github.com/dotnet/coreclr/issues/17170
- Works around an issue on linux-arm64 where pthread_create appears to reserve less stack space than requested without failing, and limits the max reserved stack size to the ulimit-configued value, leading to seg fault when the requested size of stack space is attempted to be used
- Fixed to write every page in the stack region that is being checked, to guarantee hitting the guard page in case of failure. The JIT's stack probe for stackalloc should cover this, https://github.com/dotnet/coreclr/issues/16827 details why that was not happening, in any case this test is not intended to test stack probing behavior, so this change makes the test a bit stronger in what it actually intends to test.
@kouvel
Copy link
Member

kouvel commented May 18, 2018

@sdmaclea:

It might just be that because this is a 64K page platform arm64 platform, the 16M-64K does not leave enough space for stack address randomization

I think you are right, the 64 KB buffer was not enough. Based on the response from the Bugzilla bug above, I think 4 pages worth of gap should be more than enough. @sdmaclea, could you please try the test below (slightly modified from above to fix some issues, changed 64 KB buffer to 256 KB) and see if it works? If it works I'll go ahead and change the test back to what it was before and increase the buffer.

#include <unistd.h>
#include <errno.h>
#include <string.h>

#include <pthread.h>

#include <iostream>
using namespace std;

void PrintError(int error)
{
    const char *errorName = "?";
    switch(error)
    {
        case ENOMEM:
            errorName = "ENOMEM";
            break;
        case EINVAL:
            errorName = "EINVAL";
            break;
        case EAGAIN:
            errorName = "EAGAIN";
            break;
        case EPERM:
            errorName = "EPERM";
            break;
        default:
            break;
    }
    cout << errorName << " (" << error << ')';
}

bool CheckAndPrintErrorOnNonzero(const char *op, int error)
{
    if(error == 0)
        return true;
    cout << op << " failed: ";
    PrintError(error);
    cout << '\n';
    return false;
}

size_t requestedStackSize = 16 << 20;

void *ThreadStart(void *)
{
    char a[requestedStackSize - (256 << 10)];
    memset(a, 0, sizeof(a));
    cout << "success\n";
    return nullptr;
}

int main()
{
    pthread_attr_t attr;
    int error = pthread_attr_init(&attr);
    if(!CheckAndPrintErrorOnNonzero("attr_init", error))
        return 0;

    error = pthread_attr_setstacksize(&attr, requestedStackSize);
    if(!CheckAndPrintErrorOnNonzero("attr_setstacksize", error))
        return 0;

    pthread_t thread;
    error = pthread_create(&thread, &attr, &ThreadStart, nullptr);
    if(!CheckAndPrintErrorOnNonzero("create", error))
        return 0;

    error = pthread_attr_destroy(&attr);
    if(!CheckAndPrintErrorOnNonzero("attr_destroy", error))
        return 0;

    error = pthread_getattr_np(thread, &attr);
    if(!CheckAndPrintErrorOnNonzero("getattr", error))
        return 0;

    error = pthread_attr_destroy(&attr);
    if(!CheckAndPrintErrorOnNonzero("attr_destroy", error))
        return 0;

    error = pthread_join(thread, nullptr);
    if(!CheckAndPrintErrorOnNonzero("attr_destroy", error))
        return 0;

    return 0;
}

@sdmaclea
Copy link
Contributor Author

@kouvel Latest test passes with "success"

@kouvel
Copy link
Member

kouvel commented May 19, 2018

Ok great, thanks!

kouvel referenced this issue in kouvel/corefx May 19, 2018
stephentoub referenced this issue in dotnet/corefx May 21, 2018
* Fix test for Thread constructor taking stack size

Leave some pages worth of buffer when allocating stack space. See https://github.com/dotnet/coreclr/issues/17170.

* Fix comment

* Fix comment
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants