Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable sending activation to dispatch queue threads #102887

Conversation

janvorli
Copy link
Member

Until recently, it was not possible to send activation signal to Apple dispatch queue threads using pthread_kill. So in case a .NET code was running on such a thread, it could end up blocking runtime suspension in some cases. Recently, Apple has implemented a new API to enable sending specific signals to the dispatch queues, the dispatch_allow_send_signals.

This change adds a call to that API to the PAL initialization code.

Until recently, it was not possible to send activation signal to Apple
dispatch queue threads using pthread_kill. So in case a .NET code was running
on such a thread, it could end up blocking runtime suspension in some cases.
Recently, Apple has implemented a new API to enable sending specific signals
to the dispatch queues, the dispatch_allow_send_signals.

This change adds a call to that API to the PAL initialization code.
@janvorli janvorli added this to the 9.0.0 milestone May 30, 2024
@janvorli janvorli requested a review from jkotas May 30, 2024 18:12
@janvorli janvorli self-assigned this May 30, 2024
@janvorli
Copy link
Member Author

Hmm, our CI machines clearly don't have the latest SDK, so this PR will have to wait till they are updated then.

@jkotas jkotas added the blocked Issue/PR is blocked on something - see comments label May 30, 2024
@janvorli janvorli removed the blocked Issue/PR is blocked on something - see comments label Jun 5, 2024
@janvorli
Copy link
Member Author

janvorli commented Jun 5, 2024

@jkotas it seems that our CI machines cannot be updated to the xcode >= 15.4 yet due to some limitations, so I have added a temporary workaround that uses dlopen / dladdr to get the function.

@janvorli janvorli force-pushed the enable-activation-sending-to-dispatch-queue-threads branch from 5ea3596 to 6ba16e0 Compare June 5, 2024 20:51
@jkotas
Copy link
Member

jkotas commented Jun 5, 2024

Do we need the same change for native AOT?

@jkotas
Copy link
Member

jkotas commented Jun 5, 2024

Is there a test that we can add to ensure that this works as expected?

@janvorli
Copy link
Member Author

janvorli commented Jun 5, 2024

Is there a test that we can add to ensure that this works as expected?

I have a native test that creates a dispatch queue thread and then attempts to send a signal to it and checks that the pthread_kill returns success. It fails without this call and succeeds with it. However, I am not sure how to make a coreclr test that would check that it works. Maybe I can just pinvoke that native test from a .NET process. That would verify that the process is set to allow sending activation signals to dispatch queue threads.

@janvorli
Copy link
Member Author

janvorli commented Jun 5, 2024

Do we need the same change for native AOT?

Good point, we do.

@jkotas
Copy link
Member

jkotas commented Jun 5, 2024

I have a native test that creates a dispatch queue thread and then attempts to send a signal to it and checks that the pthread_kill returns success. However, I am not sure how to make a coreclr test that would check that it works.

Run a tight managed infinite loop on the dispatch queue thread and make sure that GC triggered on a different thread does not hang.

janvorli added 2 commits June 24, 2024 20:43
The added coreclr test has revealed that while the signal could be sent now,
the inject_activation_handler was not being invoked on the dispatch queue
thread. The problem was that all signals are masked by default there.
The test verifies that .NET code running on a dispatch queue thread in
a tight loop can get the activation injected.
@janvorli
Copy link
Member Author

@jkotas I have added the test. It has revealed that the fix was missing one piece. We also need to unmask the activation signal on the threads that join the runtime, since the dispatch queue ones have it masked off by default.
I have added the unmasking for all Unixes.

@jkotas
Copy link
Member

jkotas commented Jun 24, 2024

Build breaks...

@@ -1034,6 +1046,27 @@ REDHAWK_PALEXPORT UInt32_BOOL REDHAWK_PALAPI PalRegisterHijackCallback(_In_ PalH
ASSERT(g_pHijackCallback == NULL);
g_pHijackCallback = callback;

#ifdef __APPLE__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to work for iOS?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should, the new API supports iOS too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is dlopen of a random library fine wrt. app store rules?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, but I hope that we will be able to remove the dlopen very soon, it is a matter of upgrading macOS on our CI machines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file is available at /usr/lib/system/introspection/libdispatch.dylib:

ivanpovazan@EONE-MAC-ARM64 ~ $ find /usr/lib -name "libdispatch.dylib" -exec realpath {} \;
/usr/lib/system/introspection/libdispatch.dylib
ivanpovazan@EONE-MAC-ARM64 ~ $ sw_vers -productVersion 
14.5

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. I also have the library in the /usr/lib/system/introspection and not in the /usr/lib/system. But when I run a simple c app that invokes that API under lldb and list the images in the process, it shows the path I have used:

[  8] 502762EE-7AA7-306C-9DBD-88981A86BB78 0x00000001831b2000 /usr/lib/system/libdispatch.dylib

The dlopen works no matter whether I use the /usr/lib/system/libdispatch.dylib or /usr/lib/system/introspection/libdispatch.dylib. However, if I load the one in the /usr/lib/system/introspection, then lldb shows many warnings like

objc[47199]: Class OS_dispatch_workloop is implemented in both /usr/lib/system/libdispatch.dylib (0x1eb1a3ad8) and /usr/lib/system/introspection/libdispatch.dylib (0x1001d8118). One of the two will be used. Which one is undefined.

So it seems that macOS has some redirection there and the path I have in the PR is the right one to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dylibs don't actually exist on disk in macOS anymore, they're in a system-level dyld cache, which is why 'find' or 'ls' won't find them. dlopen knows to look in the cache though, so that's why the path just works.

Ref: https://mjtsai.com/blog/2020/06/26/reverse-engineering-macos-11-0/

One way to find out which dylib a symbol is supposed to be in is by looking in the Xcode's SDKS:

$ grep dispatch_allow_send_signals /Applications/Xcode_15.3.0.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib -R
/Applications/Xcode_15.3.0.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib/libSystem.B.tbd:                       _dispatch_after_f, _dispatch_allocator_layout, _dispatch_allow_send_signals,
/Applications/Xcode_15.3.0.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib/system/introspection/libdispatch.tbd:                       _dispatch_after_f, _dispatch_allocator_layout, _dispatch_allow_send_signals,
/Applications/Xcode_15.3.0.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib/system/libdispatch.tbd:                       _dispatch_after_f, _dispatch_allocator_layout, _dispatch_allow_send_signals,
/Applications/Xcode_15.3.0.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib/libSystem.B_asan.tbd:                       _dispatch_after_f, _dispatch_allocator_layout, _dispatch_allow_send_signals,

This shows that the symbol is also in libSystem.B.

One interesting thing shows up when grepping in the iOS SDK:

$ grep dispatch_allow_send_signals /Applications/Xcode_15.3.0.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk/usr/lib -R
/Applications/Xcode_15.3.0.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk/usr/lib/libSystem.B.tbd:                       _dispatch_after_f, _dispatch_allocator_layout, _dispatch_allow_send_signals,
/Applications/Xcode_15.3.0.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk/usr/lib/libSystem.B_asan.tbd:                       _dispatch_after_f, _dispatch_allocator_layout, _dispatch_allow_send_signals,

The symbol is only available in libSystem, not libdispatch (the same happens in tvOS too).

My recommendation would be to dlopen libSystem instead ("/usr/lib/libSystem.dylib"), hopefully that will work on all platforms.

Ideally you'd be able to remove the call to dlopen, that would reliably work on all platforms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally you'd be able to remove the call to dlopen, that would reliably work on all platforms.

Right, I am really waiting for the CI machines to get updated.

I'll give the libSystem a try, hopefully it would work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rolfbjarne thank you for the suggestion, the libSystem works too. I'll update this PR.

@janvorli
Copy link
Member Author

/azp run runtime-coreclr outerloop

Copy link

Azure Pipelines failed to run 1 pipeline(s).

@janvorli
Copy link
Member Author

/azp run runtime-coreclr outerloop

Copy link

Azure Pipelines failed to run 1 pipeline(s).

@janvorli
Copy link
Member Author

/azp run runtime-coreclr outerloop

Copy link

Azure Pipelines failed to run 1 pipeline(s).

@janvorli
Copy link
Member Author

I have just realized I also need to skip the test on macOS that doesn't support the new API, as it would obviously hang.

@janvorli
Copy link
Member Author

/azp list

Copy link

CI/CD Pipelines for this repository:

@janvorli
Copy link
Member Author

/azp run runtime-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@janvorli
Copy link
Member Author

/azp run runtime-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@janvorli janvorli merged commit 742bae8 into dotnet:main Jun 27, 2024
133 of 147 checks passed
@janvorli janvorli deleted the enable-activation-sending-to-dispatch-queue-threads branch June 27, 2024 15:37
@github-actions github-actions bot locked and limited conversation to collaborators Jul 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants