Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cached interface dispatch for coreclr #111771

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
2b2ca52
Move the cached interface dispatch code into a shared region
davidwrighton Jan 8, 2025
4892674
Split cached interface dispatch up into a component which is focussed…
davidwrighton Jan 14, 2025
d69528f
It builds for X64, VTable stuff isn't probably correct, but its basic…
davidwrighton Jan 16, 2025
5f1f2b5
Add indirection cell helper so that VSD and CachedInterfaceDispatch c…
davidwrighton Jan 21, 2025
976bf83
Ready to try running things. R2R not yet supported. Virtual delegates…
davidwrighton Jan 22, 2025
652930c
Initialize CachedInterfaceDispatch at startup
davidwrighton Jan 22, 2025
39a2574
AMD64 seems to work
davidwrighton Jan 22, 2025
4c0865c
Arm64 Windows assembly written and factored amd64 to be similar
davidwrighton Jan 22, 2025
645c487
Allow there to be flavors of the build which do not build cached inte…
davidwrighton Jan 22, 2025
921631a
Make it possible for some OS/Architecture sets to have cached interfa…
davidwrighton Jan 24, 2025
73b0b26
Fix X86 build
davidwrighton Jan 24, 2025
2cdd955
Get Linux Arm64 and Amd64 into a possibly good state
davidwrighton Jan 24, 2025
edad834
Enable cached interface dispatch to build properly on Linux Amd64. No…
davidwrighton Jan 24, 2025
df393d9
Enable building cached interface dispatch for Linux arm64
davidwrighton Jan 24, 2025
cce3bcb
Add AVLocation for the VTable helper which wasn't present in the Nati…
davidwrighton Jan 24, 2025
4bbcdaa
Merge branch 'main' of github.com:dotnet/runtime into cached_interfac…
davidwrighton Jan 24, 2025
c320e1d
Fix musl build failure
davidwrighton Jan 25, 2025
361588a
Handle missed RhpVTableOffsetDispatchAVLocation case
davidwrighton Jan 25, 2025
24e78b2
Move RiscV stub dispatch logic to the same place as everything else
davidwrighton Jan 25, 2025
5b0e5ac
Fix assertion issue with collectible assemblies
davidwrighton Jan 27, 2025
fa7826a
Reduce InterfaceDispatchCell size from 4 pointers to 2, and actually …
davidwrighton Jan 27, 2025
f1c2c65
Use the isCachedInterfaceDispatchStubAVLocation helper where appropriate
davidwrighton Jan 28, 2025
36c9cc0
Enable using cached interface dispatch in R2R
davidwrighton Jan 28, 2025
9377342
Merge branch 'main' of https://github.com/dotnet/runtime into cached_…
davidwrighton Jan 28, 2025
a3a4ff1
Move PalInterlockedCompareExchange128 to the PAL/minipal
davidwrighton Jan 28, 2025
18b3f13
Add support for cleaning up memory for the cache blocks
davidwrighton Jan 29, 2025
80ae02b
Fix Open Virtual Dispatch on Delegates
davidwrighton Jan 29, 2025
081fac5
Try to fix unix build
davidwrighton Jan 29, 2025
a0b9d2a
Fixes for issues found in CI
davidwrighton Jan 30, 2025
dcbe17c
Fix more issues found on Unix platforms
davidwrighton Jan 30, 2025
6e5a394
Merge branch 'main' into cached_interface_dispatch_for_coreclr
davidwrighton Jan 31, 2025
48b5009
Fix x64 stub dispatch code to use the right register, and switch to a…
davidwrighton Jan 31, 2025
ad64a55
Merge branch 'main' of https://github.com/dotnet/runtime into cached_…
davidwrighton Feb 3, 2025
fa72602
Add environment variable to control use of cached dispatch for testin…
davidwrighton Feb 3, 2025
08ac0a1
Fix interface stepping for cached interface dispatch
davidwrighton Feb 4, 2025
006537d
Respond to most of the feedback
davidwrighton Feb 7, 2025
2ff4d2a
Feedback and fixes
davidwrighton Feb 8, 2025
6cb2b78
Merge branch 'main' of https://github.com/dotnet/runtime into cached_…
davidwrighton Feb 8, 2025
70dacc0
Use runtime as the directory to hold stuff shared between NativeAOT a…
davidwrighton Feb 8, 2025
3f44b9b
Address more feedback
davidwrighton Feb 10, 2025
bea72d0
Update preserved registers in CLR ABI documentation
davidwrighton Feb 11, 2025
98a186e
Merge branch 'main' of https://github.com/dotnet/runtime into cached_…
davidwrighton Feb 11, 2025
2b744f8
Adjust to changes upstream
davidwrighton Feb 11, 2025
becc4ce
First set of PR feedback from Katelyn
davidwrighton Mar 3, 2025
5bd7040
Next set of feedback
davidwrighton Mar 3, 2025
0ac5629
Code review add CONSISTENCY_CHECK back.
davidwrighton Mar 4, 2025
301150e
Merge branch 'main' of https://github.com/dotnet/runtime into cached_…
davidwrighton Mar 4, 2025
18fd107
Refactoring feedback. Founda bug where the linked list was incomplete
davidwrighton Mar 4, 2025
1c7e310
Merge branch 'main' of https://github.com/dotnet/runtime into cached_…
davidwrighton Mar 4, 2025
0380766
Merge branch 'main' into cached_interface_dispatch_for_coreclr
davidwrighton Mar 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/design/coreclr/botr/clr-abi.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ ARM64-only: When a method returns a structure that is larger than 16 bytes the c

## Hidden parameters

*Stub dispatch* - when a virtual call uses a VSD stub, rather than back-patching the calling code (or disassembling it), the JIT must place the address of the stub used to load the call target, the "stub indirection cell", in (x86) `EAX` / (AMD64) `R11` / (AMD64 NativeAOT ABI) `R10` / (ARM) `R4` / (ARM NativeAOT ABI) `R12` / (ARM64) `R11`. In the JIT, this is encapsulated in the `VirtualStubParamInfo` class.
*Stub dispatch* - when a virtual call uses a VSD stub, rather than back-patching the calling code (or disassembling it), the JIT must place the address of the stub used to load the call target, the "stub indirection cell", in (x86) `EAX` / (AMD64) `R11` / (ARM) `R4` / (ARM NativeAOT ABI) `R12` / (ARM64) `R11`. In the JIT, this is encapsulated in the `VirtualStubParamInfo` class.

*Calli Pinvoke* - The VM wants the address of the PInvoke in (AMD64) `R10` / (ARM) `R12` / (ARM64) `R14` (In the JIT: `REG_PINVOKE_TARGET_PARAM`), and the signature (the pinvoke cookie) in (AMD64) `R11` / (ARM) `R4` / (ARM64) `R15` (in the JIT: `REG_PINVOKE_COOKIE_PARAM`).

Expand Down Expand Up @@ -812,7 +812,7 @@ Therefore it will expand all indirect calls via the validation helper and a manu
## CFG details for x64

On x64, `CORINFO_HELP_VALIDATE_INDIRECT_CALL` takes the call address in `rcx`.
In addition to the usual registers it also preserves all float registers and `rcx` and `r10`; furthermore, shadow stack space is not required to be allocated.
In addition to the usual registers it also preserves all float registers, `rcx`, and `r10`; furthermore, shadow stack space is not required to be allocated.

`CORINFO_HELP_DISPATCH_INDIRECT_CALL` takes the call address in `rax` and it reserves the right to use and trash `r10` and `r11`.
The JIT uses the dispatch helper on x64 whenever possible as it is expected that the code size benefits outweighs the less accurate branch prediction.
Expand Down
22 changes: 21 additions & 1 deletion src/coreclr/clrfeatures.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,24 @@ endif()

if (CLR_CMAKE_TARGET_WIN32)
set(FEATURE_TYPEEQUIVALENCE 1)
endif(CLR_CMAKE_TARGET_WIN32)
endif(CLR_CMAKE_TARGET_WIN32)


if (CLR_CMAKE_TARGET_MACCATALYST OR CLR_CMAKE_TARGET_IOS OR CLR_CMAKE_TARGET_TVOS)
set(FEATURE_CORECLR_CACHED_INTERFACE_DISPATCH 1)
set(FEATURE_CORECLR_VIRTUAL_STUB_DISPATCH 0)
else()
# Enable cached interface dispatch so that we can test/debug it more easily on non-embedded scenarios (set DOTNET_UseCachedInterfaceDispatch=1)
# Only enable in chk/debug builds as this support isn't intended for retail use elsewhere
if (CLR_CMAKE_TARGET_ARCH_AMD64 OR CLR_CMAKE_TARGET_ARCH_ARM64)
set(FEATURE_CORECLR_CACHED_INTERFACE_DISPATCH $<IF:$<CONFIG:Debug,Checked>,1,0>)
else()
set(FEATURE_CORECLR_CACHED_INTERFACE_DISPATCH 0)
endif()
set(FEATURE_CORECLR_VIRTUAL_STUB_DISPATCH 1)
endif()

if (CLR_CMAKE_HOST_UNIX AND CLR_CMAKE_HOST_ARCH_AMD64)
# Allow 16 byte compare-exchange (cmpxchg16b)
add_compile_options($<${FEATURE_CORECLR_CACHED_INTERFACE_DISPATCH}:-mcx16>)
endif()
3 changes: 3 additions & 0 deletions src/coreclr/crossgen-corelib.proj
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,10 @@
<CrossGenDllCmd Condition="'$(TargetsAndroid)' == 'true'">$(CrossGenDllCmd) --targetos:linux</CrossGenDllCmd>
<CrossGenDllCmd Condition="'$(UsingToolIbcOptimization)' != 'true' and '$(EnableNgenOptimization)' == 'true'">$(CrossGenDllCmd) -m:$(MergedMibcPath) --embed-pgo-data</CrossGenDllCmd>
<CrossGenDllCmd>$(CrossGenDllCmd) -O</CrossGenDllCmd>
<!-- Enable type and field layout verification to make it easier to catch when the crossgen2 type layout engine and the runtime disagree on layout conventions -->
<CrossGenDllCmd Condition="'$(Configuration)' == 'Debug' or '$(Configuration)' == 'Checked'">$(CrossGenDllCmd) --verify-type-and-field-layout</CrossGenDllCmd>
<!-- Enable Cached Interface Dispatch layout rules for the StubDispatch import section. This allows testing of this path on platforms that do not require CachedInterface Dispatch -->
<CrossGenDllCmd Condition="'$(Configuration)' == 'Debug' or '$(Configuration)' == 'Checked'">$(CrossGenDllCmd) --enable-cached-interface-dispatch-support</CrossGenDllCmd>
<CrossGenDllCmd>$(CrossGenDllCmd) @(CoreLib)</CrossGenDllCmd>
</PropertyGroup>

Expand Down
4 changes: 4 additions & 0 deletions src/coreclr/debug/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@

add_compile_definitions($<${FEATURE_CORECLR_CACHED_INTERFACE_DISPATCH}:FEATURE_CACHED_INTERFACE_DISPATCH>)
add_compile_definitions($<${FEATURE_CORECLR_VIRTUAL_STUB_DISPATCH}:FEATURE_VIRTUAL_STUB_DISPATCH>)

add_subdirectory(daccess)
add_subdirectory(ee)
add_subdirectory(di)
Expand Down
2 changes: 2 additions & 0 deletions src/coreclr/debug/daccess/dacdbiimpl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3544,7 +3544,9 @@ void DacDbiInterfaceImpl::EnumerateMemRangesForLoaderAllocator(PTR_LoaderAllocat
if (pVcsMgr)
{
if (pVcsMgr->indcell_heap != NULL) heapsToEnumerate.Push(pVcsMgr->indcell_heap);
#ifdef FEATURE_VIRTUAL_STUB_DISPATCH
if (pVcsMgr->cache_entry_heap != NULL) heapsToEnumerate.Push(pVcsMgr->cache_entry_heap);
#endif // FEATURE_VIRTUAL_STUB_DISPATCH
}

TADDR rangeAccumAsTaddr = TO_TADDR(rangeAcummulator);
Expand Down
11 changes: 10 additions & 1 deletion src/coreclr/debug/daccess/request.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3620,14 +3620,19 @@ ClrDataAccess::TraverseVirtCallStubHeap(CLRDATA_ADDRESS pAppDomain, VCSHeapType
break;

case CacheEntryHeap:
#ifdef FEATURE_VIRTUAL_STUB_DISPATCH
// The existence of the CacheEntryHeap is part of the SOS api surface, but currently
// when FEATURE_VIRTUAL_STUB_DISPATCH is not defined, the CacheEntryHeap is not created
// so its commented out in that situation, but is not considered to be a E_INVALIDARG.
pLoaderHeap = pVcsMgr->cache_entry_heap;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a small comment here on why the CacheEntryHeap is not used in the caching logic? I'm ignorant of this area and it feels odd that the VirtualCallStubManager is used, but ignored in some cases. Perhaps an assert or some other breadcrumb to indicate why this isn't appropriate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use it, and possibly should, but as I noted in the PR description, I intentionally disabled this feature since it was more work to enable, and this change is already too big for easy development. If we decide that this sort of caching is advantageous, it would be appropriate in a separate PR to enable it for the cached interface dispatch scenario.

#endif // FEATURE_VIRTUAL_STUB_DISPATCH
break;

default:
hr = E_INVALIDARG;
}

if (SUCCEEDED(hr))
if (SUCCEEDED(hr) && (pLoaderHeap != NULL))
{
hr = TraverseLoaderHeapBlock(pLoaderHeap->m_pFirstBlock, pFunc);
}
Expand Down Expand Up @@ -3670,7 +3675,9 @@ static const char *LoaderAllocatorLoaderHeapNames[] =
"FixupPrecodeHeap",
"NewStubPrecodeHeap",
"IndcellHeap",
#ifdef FEATURE_VIRTUAL_STUB_DISPATCH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds to the confusion from above considering case CacheEntryHeap: remains, but we define out the string.

"CacheEntryHeap",
#endif // FEATURE_VIRTUAL_STUB_DISPATCH
};


Expand Down Expand Up @@ -3714,7 +3721,9 @@ HRESULT ClrDataAccess::GetLoaderAllocatorHeaps(CLRDATA_ADDRESS loaderAllocatorAd
else
{
pLoaderHeaps[i++] = HOST_CDADDR(pVcsMgr->indcell_heap);
#ifdef FEATURE_VIRTUAL_STUB_DISPATCH
pLoaderHeaps[i++] = HOST_CDADDR(pVcsMgr->cache_entry_heap);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to skip initializing this slot? The other branch explicitly zeroes out unused slots but we're not doing that here

#endif // FEATURE_VIRTUAL_STUB_DISPATCH
}

// All of the above are "LoaderHeap" and not the ExplicitControl version.
Expand Down
3 changes: 3 additions & 0 deletions src/coreclr/inc/CrstTypes.def
Original file line number Diff line number Diff line change
Expand Up @@ -529,3 +529,6 @@ End
Crst PerfMap
AcquiredAfter CodeVersioning AssemblyList
End

Crst InterfaceDispatchGlobalLists
End
1 change: 1 addition & 0 deletions src/coreclr/inc/clrconfigvalues.h
Original file line number Diff line number Diff line change
Expand Up @@ -581,6 +581,7 @@ RETAIL_CONFIG_DWORD_INFO(EXTERNAL_VirtualCallStubLogging, W("VirtualCallStubLogg
CONFIG_DWORD_INFO(INTERNAL_VirtualCallStubMissCount, W("VirtualCallStubMissCount"), 100, "Used only when STUB_LOGGING is defined, which by default is not.")
CONFIG_DWORD_INFO(INTERNAL_VirtualCallStubResetCacheCounter, W("VirtualCallStubResetCacheCounter"), 0, "Used only when STUB_LOGGING is defined, which by default is not.")
CONFIG_DWORD_INFO(INTERNAL_VirtualCallStubResetCacheIncr, W("VirtualCallStubResetCacheIncr"), 0, "Used only when STUB_LOGGING is defined, which by default is not.")
CONFIG_DWORD_INFO(INTERNAL_UseCachedInterfaceDispatch, W("UseCachedInterfaceDispatch"), 0, "If cached interface dispatch is compiled in, use that instead of virtual stub dispatch")

///
/// Watson
Expand Down
123 changes: 63 additions & 60 deletions src/coreclr/inc/crsttypes_generated.h
Original file line number Diff line number Diff line change
Expand Up @@ -59,66 +59,67 @@ enum CrstType
CrstILStubGen = 41,
CrstInlineTrackingMap = 42,
CrstInstMethodHashTable = 43,
CrstInterop = 44,
CrstInteropData = 45,
CrstIsJMCMethod = 46,
CrstISymUnmanagedReader = 47,
CrstJit = 48,
CrstJitInlineTrackingMap = 49,
CrstJitPatchpoint = 50,
CrstJumpStubCache = 51,
CrstLeafLock = 52,
CrstListLock = 53,
CrstLoaderAllocator = 54,
CrstLoaderAllocatorReferences = 55,
CrstLoaderHeap = 56,
CrstManagedObjectWrapperMap = 57,
CrstMethodDescBackpatchInfoTracker = 58,
CrstMethodTableExposedObject = 59,
CrstModule = 60,
CrstModuleLookupTable = 61,
CrstMulticoreJitHash = 62,
CrstMulticoreJitManager = 63,
CrstNativeImageEagerFixups = 64,
CrstNativeImageLoad = 65,
CrstNotifyGdb = 66,
CrstPEImage = 67,
CrstPendingTypeLoadEntry = 68,
CrstPerfMap = 69,
CrstPgoData = 70,
CrstPinnedByrefValidation = 71,
CrstPinnedHeapHandleTable = 72,
CrstProfilerGCRefDataFreeList = 73,
CrstProfilingAPIStatus = 74,
CrstRCWCache = 75,
CrstRCWCleanupList = 76,
CrstReadyToRunEntryPointToMethodDescMap = 77,
CrstReflection = 78,
CrstReJITGlobalRequest = 79,
CrstRetThunkCache = 80,
CrstSigConvert = 81,
CrstSingleUseLock = 82,
CrstStressLog = 83,
CrstStubCache = 84,
CrstStubDispatchCache = 85,
CrstSyncBlockCache = 86,
CrstSyncHashLock = 87,
CrstSystemDomain = 88,
CrstSystemDomainDelayedUnloadList = 89,
CrstThreadIdDispenser = 90,
CrstThreadLocalStorageLock = 91,
CrstThreadStore = 92,
CrstTieredCompilation = 93,
CrstTypeEquivalenceMap = 94,
CrstTypeIDMap = 95,
CrstUMEntryThunkCache = 96,
CrstUMEntryThunkFreeListLock = 97,
CrstUniqueStack = 98,
CrstUnresolvedClassLock = 99,
CrstUnwindInfoTableLock = 100,
CrstVSDIndirectionCellLock = 101,
CrstWrapperTemplate = 102,
kNumberOfCrstTypes = 103
CrstInterfaceDispatchGlobalLists = 44,
CrstInterop = 45,
CrstInteropData = 46,
CrstIsJMCMethod = 47,
CrstISymUnmanagedReader = 48,
CrstJit = 49,
CrstJitInlineTrackingMap = 50,
CrstJitPatchpoint = 51,
CrstJumpStubCache = 52,
CrstLeafLock = 53,
CrstListLock = 54,
CrstLoaderAllocator = 55,
CrstLoaderAllocatorReferences = 56,
CrstLoaderHeap = 57,
CrstManagedObjectWrapperMap = 58,
CrstMethodDescBackpatchInfoTracker = 59,
CrstMethodTableExposedObject = 60,
CrstModule = 61,
CrstModuleLookupTable = 62,
CrstMulticoreJitHash = 63,
CrstMulticoreJitManager = 64,
CrstNativeImageEagerFixups = 65,
CrstNativeImageLoad = 66,
CrstNotifyGdb = 67,
CrstPEImage = 68,
CrstPendingTypeLoadEntry = 69,
CrstPerfMap = 70,
CrstPgoData = 71,
CrstPinnedByrefValidation = 72,
CrstPinnedHeapHandleTable = 73,
CrstProfilerGCRefDataFreeList = 74,
CrstProfilingAPIStatus = 75,
CrstRCWCache = 76,
CrstRCWCleanupList = 77,
CrstReadyToRunEntryPointToMethodDescMap = 78,
CrstReflection = 79,
CrstReJITGlobalRequest = 80,
CrstRetThunkCache = 81,
CrstSigConvert = 82,
CrstSingleUseLock = 83,
CrstStressLog = 84,
CrstStubCache = 85,
CrstStubDispatchCache = 86,
CrstSyncBlockCache = 87,
CrstSyncHashLock = 88,
CrstSystemDomain = 89,
CrstSystemDomainDelayedUnloadList = 90,
CrstThreadIdDispenser = 91,
CrstThreadLocalStorageLock = 92,
CrstThreadStore = 93,
CrstTieredCompilation = 94,
CrstTypeEquivalenceMap = 95,
CrstTypeIDMap = 96,
CrstUMEntryThunkCache = 97,
CrstUMEntryThunkFreeListLock = 98,
CrstUniqueStack = 99,
CrstUnresolvedClassLock = 100,
CrstUnwindInfoTableLock = 101,
CrstVSDIndirectionCellLock = 102,
CrstWrapperTemplate = 103,
kNumberOfCrstTypes = 104
};

#endif // __CRST_TYPES_INCLUDED
Expand Down Expand Up @@ -173,6 +174,7 @@ int g_rgCrstLevelMap[] =
6, // CrstILStubGen
2, // CrstInlineTrackingMap
18, // CrstInstMethodHashTable
0, // CrstInterfaceDispatchGlobalLists
21, // CrstInterop
9, // CrstInteropData
0, // CrstIsJMCMethod
Expand Down Expand Up @@ -281,6 +283,7 @@ LPCSTR g_rgCrstNameMap[] =
"CrstILStubGen",
"CrstInlineTrackingMap",
"CrstInstMethodHashTable",
"CrstInterfaceDispatchGlobalLists",
"CrstInterop",
"CrstInteropData",
"CrstIsJMCMethod",
Expand Down
12 changes: 2 additions & 10 deletions src/coreclr/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -8408,16 +8408,8 @@ class Compiler
reg = REG_EAX;
regMask = RBM_EAX;
#elif defined(TARGET_AMD64)
if (isNativeAOT)
{
reg = REG_R10;
regMask = RBM_R10;
}
else
{
reg = REG_R11;
regMask = RBM_R11;
}
reg = REG_R11;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AAGGHH.. reading this has sent me down the rabbit hole of sadness which has told me that I need to move both CoreCLR and NativeAOT to use r10, and not to r11 as that is the only way to keep CFG from exploding sadly. (Or I can keep the difference between CoreCLR and NativeAOT).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of getting these unified as you may guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rather... this won't break stuff, but the codegen for CFG is required to be a bit non-ideal in native aot..OTOH, it appears with some quick testing that this doesn't actually effect the codegen in any case, since our CFG generation logic isn't what I'd call great right now, so I'd like to keep moving everything to r11. If we want to use r10 we can swap back all of the amd64 stuff at once.

regMask = RBM_R11;
#elif defined(TARGET_ARM)
if (isNativeAOT)
{
Expand Down
13 changes: 13 additions & 0 deletions src/coreclr/minipal/minipal.h
Original file line number Diff line number Diff line change
Expand Up @@ -76,3 +76,16 @@ class VMToOSInterface
// true if it succeeded, false if it failed
static bool ReleaseRWMapping(void* pStart, size_t size);
};

#if defined(HOST_64BIT) && defined(FEATURE_CACHED_INTERFACE_DISPATCH)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we added the InterlockedCompareExchange128 to the VMToOSInterface above and implemented it for Unix / Windows in the respective subfolders.
When I have started with this minipal, the intent was that all stuff that needs to be platform specific will go there. Once we get rid of coreclr PAL, this was meant to be the only place for platform specific code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this is a case where we want to have a shared set of PAL apis between CoreCLR and NativeAOT, and NativeAOT hasn't moved towards the VmToOSInterface idea (And this change is already WAY too big to add that to it.) I'd welcome some rationalization after this is all merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we're adding a HOST_64BIT guard here but when I look at the CMakeLists changes it doesn't seem like we're necessarily doing the work to enable 128-bit cmpxchg for all 64-bit targets. Is this going to be fine on all the 64-bit targets we support or do we need to guard it with more specific conditionals? I know we have some support for more esoteric architectures than arm64 and amd64, and I'm not sure whether those are 64-bit hosts.

64-bit WASM doesn't have 16-byte cmpxchg, but it's probably safe to ignore that one since most WASM applications are using a 32-bit address space (it's much faster) where an 8-byte cmpxchg is good enough

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was discussed above #111771 (comment), currently it's only enabled for amd64 and arm64 for non-nativeaot parts. Also, amd64 has -mcx16 flag added as part of this PR; which exists in nativeaot where this code is copied from.

EXTERN_C uint8_t _InterlockedCompareExchange128(int64_t volatile *, int64_t, int64_t, int64_t *);

#if defined(HOST_WINDOWS)
#pragma intrinsic(_InterlockedCompareExchange128)
#endif

FORCEINLINE uint8_t PalInterlockedCompareExchange128(_Inout_ int64_t volatile *pDst, int64_t iValueHigh, int64_t iValueLow, int64_t *pComparandAndResult)
{
return _InterlockedCompareExchange128(pDst, iValueHigh, iValueLow, pComparandAndResult);
}
#endif // defined(HOST_64BIT) && defined(FEATURE_CACHED_INTERFACE_DISPATCH)
2 changes: 1 addition & 1 deletion src/coreclr/nativeaot/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ if(CLR_CMAKE_HOST_UNIX)
endif(CLR_CMAKE_TARGET_APPLE)

if(CLR_CMAKE_TARGET_ARCH_AMD64 OR CLR_CMAKE_TARGET_ARCH_I386)
# Allow 16 byte compare-exchange
# Allow 16 byte compare-exchange (cmpxchg16b)
add_compile_options(-mcx16)
endif(CLR_CMAKE_TARGET_ARCH_AMD64 OR CLR_CMAKE_TARGET_ARCH_I386)
endif (CLR_CMAKE_HOST_UNIX)
Expand Down
15 changes: 12 additions & 3 deletions src/coreclr/nativeaot/Runtime/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
set(GC_DIR ../../gc)
set(RUNTIME_DIR ../../runtime)

set(COMMON_RUNTIME_SOURCES
allocheap.cpp
rhassert.cpp
CachedInterfaceDispatch.cpp
${RUNTIME_DIR}/CachedInterfaceDispatch.cpp
CachedInterfaceDispatchAot.cpp
Crst.cpp
DebugHeader.cpp
MethodTable.cpp
Expand Down Expand Up @@ -76,6 +78,7 @@ include_directories(.)
include_directories(${GC_DIR})
include_directories(${GC_DIR}/env)
include_directories(${CMAKE_CURRENT_BINARY_DIR}/eventpipe/inc)
include_directories(${RUNTIME_DIR})

if (WIN32)
set(GC_HEADERS
Expand Down Expand Up @@ -208,11 +211,17 @@ list(APPEND RUNTIME_SOURCES_ARCH_ASM
${ARCH_SOURCES_DIR}/MiscStubs.${ASM_SUFFIX}
${ARCH_SOURCES_DIR}/PInvoke.${ASM_SUFFIX}
${ARCH_SOURCES_DIR}/InteropThunksHelpers.${ASM_SUFFIX}
${ARCH_SOURCES_DIR}/StubDispatch.${ASM_SUFFIX}
${RUNTIME_DIR}/${ARCH_SOURCES_DIR}/StubDispatch.${ASM_SUFFIX}
${ARCH_SOURCES_DIR}/UniversalTransition.${ASM_SUFFIX}
${ARCH_SOURCES_DIR}/WriteBarriers.${ASM_SUFFIX}
)

if (CLR_CMAKE_TARGET_ARCH_AMD64 OR CLR_CMAKE_TARGET_ARCH_ARM64)
list(APPEND RUNTIME_SOURCES_ARCH_ASM
${ARCH_SOURCES_DIR}/CachedInterfaceDispatchAot.${ASM_SUFFIX}
)
endif ()

# Add architecture specific folder for looking up headers.
convert_to_absolute_path(ARCH_SOURCES_DIR ${ARCH_SOURCES_DIR})
include_directories(${ARCH_SOURCES_DIR})
Expand Down Expand Up @@ -289,7 +298,7 @@ if (CLR_CMAKE_TARGET_UNIX)

endif(CLR_CMAKE_TARGET_UNIX)

set(RUNTIME_DIR ${CMAKE_CURRENT_SOURCE_DIR})
set(NATIVEAOT_RUNTIME_DIR ${CMAKE_CURRENT_SOURCE_DIR})

list(APPEND COMMON_RUNTIME_SOURCES ${GC_HEADERS})

Expand Down
Loading
Loading