-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
membarrier(REGISTER_PRIVATE_EXPEDITED) waits through an unnecessary RCU grace period during Linux process startup #106722
Labels
area-PAL-coreclr
in-pr
There is an active PR which will close this issue when it is merged
tenet-performance
Performance related issue
Comments
dotnet-issue-labeler
bot
added
the
needs-area-label
An area label is needed to ensure this gets routed to the appropriate area owners
label
Aug 20, 2024
dotnet-policy-service
bot
added
the
untriaged
New issue has not been triaged by the area owner
label
Aug 20, 2024
harisokanovic
pushed a commit
to harisokanovic/dotnet_runtime
that referenced
this issue
Aug 20, 2024
Refactor InitializeFlushProcessWriteBuffers(): Split membarrier() initialization into a new InitializeMembarrier() helper function. InitializeMembarrier() earlier before first thread is created to improve process start time on Linux. More details can be found in issue 106722. Fixes dotnet#106722
dotnet-policy-service
bot
added
the
in-pr
There is an active PR which will close this issue when it is merged
label
Aug 20, 2024
filipnavara
added
area-PAL-coreclr
and removed
needs-area-label
An area label is needed to ensure this gets routed to the appropriate area owners
labels
Aug 20, 2024
harisokanovic
pushed a commit
to harisokanovic/dotnet_runtime
that referenced
this issue
Aug 21, 2024
… improve start time InitializeFlushProcessWriteBuffers() initializes expedited membarrier() syscall on Linux, which is much slower when called in a multi-thread process. Move this init earlier to improve dotnet process start time. A detailed explanation can be found in issue 106722. Fixes dotnet#106722
harisokanovic
pushed a commit
to harisokanovic/dotnet_runtime
that referenced
this issue
Aug 21, 2024
… improve start time InitializeFlushProcessWriteBuffers() initializes expedited membarrier() syscall on Linux, which is much slower when called in a multi-thread process. Move this init earlier to improve dotnet process start time. A detailed explanation can be found in issue 106722. Fixes dotnet#106722
dotnet-policy-service
bot
removed
the
untriaged
New issue has not been triaged by the area owner
label
Aug 22, 2024
github-actions bot
pushed a commit
that referenced
this issue
Aug 22, 2024
… improve start time InitializeFlushProcessWriteBuffers() initializes expedited membarrier() syscall on Linux, which is much slower when called in a multi-thread process. Move this init earlier to improve dotnet process start time. A detailed explanation can be found in issue 106722. Fixes #106722
jkotas
pushed a commit
that referenced
this issue
Aug 22, 2024
… improve start time (#106836) InitializeFlushProcessWriteBuffers() initializes expedited membarrier() syscall on Linux, which is much slower when called in a multi-thread process. Move this init earlier to improve dotnet process start time. A detailed explanation can be found in issue 106722. Fixes #106722 Co-authored-by: Haris Okanovic <harisokn@amazon.com>
harisokanovic
pushed a commit
to harisokanovic/dotnet_runtime
that referenced
this issue
Aug 28, 2024
…ialize() A fixup of commit 27ee590 that's broken on platforms which don't support membarrier() syscall: GetVirtualPageSize() is called in the fallback path of InitializeFlushProcessWriteBuffers() and attempts to mmap() zero bytes. Move InitializeFlushProcessWriteBuffers() after VIRTUALInitialize() but before the first thread is created. Fixes dotnet#106892 Fixes dotnet#106722
harisokanovic
pushed a commit
to harisokanovic/dotnet_runtime
that referenced
this issue
Aug 28, 2024
…ialize() A fixup of commit 27ee590 that's broken on platforms which don't support membarrier() syscall: GetVirtualPageSize() is called in the fallback path of InitializeFlushProcessWriteBuffers() and attempts to mmap() zero bytes. Move InitializeFlushProcessWriteBuffers() after VIRTUALInitialize() but before the first thread is created. Fixes dotnet#106892 Fixes dotnet#106722
janvorli
pushed a commit
that referenced
this issue
Aug 28, 2024
…ialize() (#107100) A fixup of commit 27ee590 that's broken on platforms which don't support membarrier() syscall: GetVirtualPageSize() is called in the fallback path of InitializeFlushProcessWriteBuffers() and attempts to mmap() zero bytes. Move InitializeFlushProcessWriteBuffers() after VIRTUALInitialize() but before the first thread is created. Fixes #106892 Fixes #106722 Co-authored-by: Haris Okanovic <harisokn@amazon.com>
github-actions bot
pushed a commit
that referenced
this issue
Aug 28, 2024
…ialize() A fixup of commit 27ee590 that's broken on platforms which don't support membarrier() syscall: GetVirtualPageSize() is called in the fallback path of InitializeFlushProcessWriteBuffers() and attempts to mmap() zero bytes. Move InitializeFlushProcessWriteBuffers() after VIRTUALInitialize() but before the first thread is created. Fixes #106892 Fixes #106722
jkotas
pushed a commit
that referenced
this issue
Aug 29, 2024
…ialize() (#107114) A fixup of commit 27ee590 that's broken on platforms which don't support membarrier() syscall: GetVirtualPageSize() is called in the fallback path of InitializeFlushProcessWriteBuffers() and attempts to mmap() zero bytes. Move InitializeFlushProcessWriteBuffers() after VIRTUALInitialize() but before the first thread is created. Fixes #106892 Fixes #106722 Co-authored-by: Haris Okanovic <harisokn@amazon.com>
jtschuster
pushed a commit
to jtschuster/runtime
that referenced
this issue
Sep 17, 2024
…ialize() (dotnet#107100) A fixup of commit 27ee590 that's broken on platforms which don't support membarrier() syscall: GetVirtualPageSize() is called in the fallback path of InitializeFlushProcessWriteBuffers() and attempts to mmap() zero bytes. Move InitializeFlushProcessWriteBuffers() after VIRTUALInitialize() but before the first thread is created. Fixes dotnet#106892 Fixes dotnet#106722 Co-authored-by: Haris Okanovic <harisokn@amazon.com>
mikelle-rogers
pushed a commit
to mikelle-rogers/runtime
that referenced
this issue
Dec 10, 2024
… improve start time (dotnet#106724) InitializeFlushProcessWriteBuffers() initializes expedited membarrier() syscall on Linux, which is much slower when called in a multi-thread process. Move this init earlier to improve dotnet process start time. A detailed explanation can be found in issue 106722. Fixes dotnet#106722 Co-authored-by: Haris Okanovic <harisokn@amazon.com>
mikelle-rogers
pushed a commit
to mikelle-rogers/runtime
that referenced
this issue
Dec 10, 2024
…ialize() (dotnet#107100) A fixup of commit 27ee590 that's broken on platforms which don't support membarrier() syscall: GetVirtualPageSize() is called in the fallback path of InitializeFlushProcessWriteBuffers() and attempts to mmap() zero bytes. Move InitializeFlushProcessWriteBuffers() after VIRTUALInitialize() but before the first thread is created. Fixes dotnet#106892 Fixes dotnet#106722 Co-authored-by: Haris Okanovic <harisokn@amazon.com>
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
area-PAL-coreclr
in-pr
There is an active PR which will close this issue when it is merged
tenet-performance
Performance related issue
Dotnet runtime uses membarrier() syscalls in the Linux implementation of FlushProcessWriteBuffers(). An initialization call to
membarrier(MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED)
can run substantially longer in a process with more than one thread, by bypassing this fast-path (mm->mm_users > 1) in the kernel.PAL_InitializeCoreCLR() hits the slow path by initializing membarrier() after launching a sync manager worker thread. Startup time can be improved by reordering membarrier init ahead of thread creation.
Potential fix in runtime PR 106724.
The issue can be demonstrated in this simple C program:
~11ms difference on a 16-core arm64 system (AWS r7g.4xlarge):
~8ms difference on 16-core x86_64 (AWS r6i.4xlarge):
A workaround can be implemented with an LD_PRELOAD shared library calling
membarrier(REGISTER_PRIVATE_EXPEDITED)
before the first dotnet thread:~15ms difference on a 16-core arm64 system (AWS r7g.4xlarge):
~10ms difference on 16-core x86_64 (AWS r6i.4xlarge):
The text was updated successfully, but these errors were encountered: