Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.InsufficientExecutionStackException when using Mono on linux-musl #76523

Closed
ayakael opened this issue Oct 3, 2022 · 9 comments
Closed
Labels
arch-s390x Related to s390x architecture (unsupported) area-Build-mono os-linux-musl Linux distributions using musl library.

Comments

@ayakael
Copy link
Contributor

ayakael commented Oct 3, 2022

Description

Encountering System.InsufficientExecutionStackException errors on linux-musl-s390x which seems to be caused by mono not dealing with musl's low default stack size.

Reproduction Steps

Within Alpine Linux linux-musl-s390x environment:

./bootstrap/dotnet build Src/Newtonsoft.Json/Newtonsoft.Json.csproj /v:diag
  • Watch the world burn

Expected behavior

Build should be successful

Actual behavior

Fails with System.InsufficientExecutionStackException errors after csc.dll process

Regression?

The same issue occurs on dotnet 6.0.109 and 6.0.401 for what its worth.

Known Workarounds

Unknown, currently trying to find ways to force a higher stack size, but so far the following approches have failed

  • find "$_cli_root" -type f -exec "$srcdir"/muslstack -s 0x800000 '{}' \; 2>&1 | grep stackSize which used muslstack to set every file in bootstrap with an 8MB stacksize
  • Building mono with -Wl,-z,stack-size=8198144 ld flag
  • Setting PTHREAD_STACK_MIN in /usr/include/limits.h from 2048 to 16384 and rebuilding mono

Configuration

  • 7.0.100-rc1 crosscompiled to linux-musl-s390x from linux-musl-x64
  • Alpine Linux Edge
  • Can be reproducedis in Alpine Linux build pipelines

Other information

Fix likely involves implementing ENSURE_PRIMARY_STACK_SIZE from coreclr to mono:

#ifdef ENSURE_PRIMARY_STACK_SIZE
/*++
Function:
EnsureStackSize
Abstract:
This fixes a problem on MUSL where the initial stack size reported by the
pthread_attr_getstack is about 128kB, but this limit is not fixed and
the stack can grow dynamically. The problem is that it makes the
functions ReflectionInvocation::[Try]EnsureSufficientExecutionStack
to fail for real life scenarios like e.g. compilation of corefx.
Since there is no real fixed limit for the stack, the code below
ensures moving the stack limit to a value that makes reasonable
real life scenarios work.
--*/
__attribute__((noinline,NOOPT_ATTRIBUTE))
void
EnsureStackSize(SIZE_T stackSize)
{
volatile uint8_t *s = (uint8_t *)_alloca(stackSize);
*s = 0;
}
#endif // ENSURE_PRIMARY_STACK_SIZE

Relevant issues:
dotnet/roslyn#64423
#72920

Full log file: newtonsoft.log

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Oct 3, 2022
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@jkotas jkotas added area-Build-mono arch-s390x Related to s390x architecture (unsupported) os-linux-musl Linux distributions using musl library. labels Oct 3, 2022
@ayakael ayakael changed the title System.InsufficientExecutionStackException on linux-musl-s390x System.InsufficientExecutionStackException when using Mono on linux-musl Oct 3, 2022
@ayakael
Copy link
Contributor Author

ayakael commented Oct 3, 2022

This is reproducible on linux-musl-x64 by building runtime 6.0.9 with flag /p:PrimaryRuntimeFlavor=Mono, thus confirming this being a Mono issue on musl rather than a Mono issue on musl for s390x

edit Though only with #68424 can one build mono on x64

@ayakael
Copy link
Contributor Author

ayakael commented Oct 3, 2022

With .net7-rc1, I cannot reproduce on linux-musl-x64 with flag /p:PrimaryRuntimeFlavor=Mono for some reason.

@ayakael
Copy link
Contributor Author

ayakael commented Oct 6, 2022

More data: the stack exceptions on .NET 6 were caused by #76543 thus the issues we're encountering here were introduced with .NET 7 and seem to be specific to s390x.

@uweigand
Copy link
Contributor

uweigand commented Oct 6, 2022

More data: the stack exceptions on .NET 6 were caused by #76543 thus the issues we're encountering here were introduced with .NET 7 and seem to be specific to s390x.

Well, the s390x ABI often leads to inherently larger stacks than Intel - every function allocates at least 160 bytes of stack (in addition to whatever is needed for local variables etc), while on Intel the minimum allocation is only 16 bytes. With many nested stack frames, it could be possible that we're actually running into stack limitations on s390x but not on Intel.

If I remember correctly, that was one reason why we increased the minimum per-thread stack size in glibc on s390x as compared to Intel.

@ayakael
Copy link
Contributor Author

ayakael commented Oct 6, 2022

More data: the stack exceptions on .NET 6 were caused by #76543 thus the issues we're encountering here were introduced with .NET 7 and seem to be specific to s390x.

Well, the s390x ABI often leads to inherently larger stacks than Intel - every function allocates at least 160 bytes of stack (in addition to whatever is needed for local variables etc), while on Intel the minimum allocation is only 16 bytes. With many nested stack frames, it could be possible that we're actually running into stack limitations on s390x but not on Intel.

If I remember correctly, that was one reason why we increased the minimum per-thread stack size in glibc on s390x as compared to Intel.

Right, and indeed, this just came in, while I am no longer encountering stack exceptions in qemu-backed s390x Alpine pipelines, I still encounter them in LinuxOne's s390x VM. Granted the latter is an Alpine sysroot in a Ubuntu machine that I chroot in. While it should not make a difference, the fact that I don't have access to a native Alpine s390x environment makes it impossible for me to rule out.

Thus, it still is up in the air whether this is .net7 specific bug.

edit well the dotnet6 bug disappeared once I deleted the /tmp folder... so that's great!

@ayakael
Copy link
Contributor Author

ayakael commented Oct 9, 2022

Fortunately we seem to be closing in on dotnet6 for s390x. Current build failure is at fsharp, where dotnet fails with error code 134. It seems stack related, error message is as follows:

 Stack overflow in unmanaged: IP: 0x3ffb05c0580, fault addr: 0x3ffd1767000 (TaskId:237)
                     * Assertion at /var/build/dotnet6/community/dotnet6-stage0/src/dotnet-v6.0.109/src/runtime/src/mono/mono/metadata/sgen-stw.c:76, condition `info->client_info.stack_start
 >= info->client_info.info.stack_start_limit && info->client_info.stack_start < info->client_info.info.stack_end' not met (TaskId:237)

Full log: fsharp.log

@ayakael
Copy link
Contributor Author

ayakael commented Oct 9, 2022

This seems related to a mono-related bug from 2016: https://bugzilla.xamarin.com/38/38641/bug.html

Should I open another issue for this since it seems different enough?

@ayakael
Copy link
Contributor Author

ayakael commented Oct 10, 2022

Bug is unrelated, thus opened a new issue. This bug seems to be fixed now, thus closed.

@ayakael ayakael closed this as completed Oct 10, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Oct 10, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Nov 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-s390x Related to s390x architecture (unsupported) area-Build-mono os-linux-musl Linux distributions using musl library.
Projects
None yet
Development

No branches or pull requests

3 participants