Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NativeAOT] linux-arm bring up #97729

Open
14 of 16 tasks
filipnavara opened this issue Jan 30, 2024 · 68 comments
Open
14 of 16 tasks

[NativeAOT] linux-arm bring up #97729

filipnavara opened this issue Jan 30, 2024 · 68 comments

Comments

@filipnavara
Copy link
Member

filipnavara commented Jan 30, 2024

This is tracking issue for the known problems that need to be resolved to get working NativeAOT support on linux-arm platform.

Known issues:

Failing runtime tests:

Other things requiring clean up:

@ghost
Copy link

ghost commented Jan 30, 2024

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

This is tracking issue for the known problems that need to be resolved to get working NativeAOT support on linux-arm platform.

Known issues:

Other things requiring clean up:

  • NativeAOT build integration support for linux-musl and linux-bionic on arm32
  • Run the smoke tests to prevent regressions
Author: filipnavara
Assignees: -
Labels:

arch-arm32, area-NativeAOT-coreclr

Milestone: -

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jan 30, 2024
@filipnavara
Copy link
Member Author

filipnavara commented Jan 31, 2024

State as of ba8993f + PRs #97746, #97756 and #97757:

  • NativeAOT runtime smoke tests in Debug configuration run and pass on Raspberry Pi 4, Raspberry Pi 5 and QEMU
  • All library tests build (full checked build of ./build.sh clr.aot+libs+libs.tests -rc Checked -lc Release /p:TestNativeAot=true took 03:48:16.60 on Raspberry Pi 5)
  • Library tests in Debug builds seem to run to some extent (no fail/pass numbers to report but multiple tests run to the end; many tests failed but they were supposed to be excluded on given platform and the runner run them anyway)
  • Library tests in Release builds crash with various errors almost instantly

@filipnavara
Copy link
Member Author

filipnavara commented Jan 31, 2024

I run the smoke tests in Release configuration. Some of them reliably fail which makes the debugging easier. Apparently we now get incorrect answer for InWriteBarrierHelper in the SIGSEGV exception handler. I'll debug it later this week.

--

printf("%x %x %x\r\n", (uintptr_t)&RhpAssignRefAVLocation, (uintptr_t)&RhpAssignRefAVLocation & ~1, faultingIP);
// prints "4b9539 4b9539 4b9538"

Don't you just love compilers? (Technically, clang is not wrong here since RhpAssignRefAVLocation is defined as external variable, not a function)

Workaround: filipnavara@39ae75f Fix: 7d25e4c

@NCLnclNCL

This comment was marked as off-topic.

@michaldobrodenka

This comment was marked as off-topic.

@filipnavara
Copy link
Member Author

filipnavara commented Feb 1, 2024

When support window x86(32 bit) ???

Please keep this issue on topic. I am doing this in my free time, I do not plan to work on win-x86 port. There's already an open issue for that.

@filipnavara
Copy link
Member Author

With the in-flight PRs I can get most of the smoke tests running in Release mode. There's one remaining issue with unwinding during GC in DynamicGenerics test:

--------------------------------------------------
Debug Assertion Violation

Expression: 'm_pInstance->IsManaged(m_ControlPC) || (m_pPreviousTransitionFrame != NULL && (m_dwFlags & SkipNativeFrames) == 0)'

File: /home/navara/runtime/src/coreclr/nativeaot/Runtime/StackFrameIterator.cpp, Line: 1500
--------------------------------------------------
Process 7527 stopped
* thread #4, name = 'DynamicGenerics', stop reason = signal SIGABRT
    frame #0: 0xf7e499f4 libc.so.6`__pthread_kill_implementation(threadid=4045403008, signo=6, no_tid=<unavailable>) at pthread_kill.c:44:76
(lldb) bt
* thread #4, name = 'DynamicGenerics', stop reason = signal SIGABRT
  * frame #0: 0xf7e499f4 libc.so.6`__pthread_kill_implementation(threadid=4045403008, signo=6, no_tid=<unavailable>) at pthread_kill.c:44:76
    frame #1: 0xf7e01cfc libc.so.6`__GI_raise(sig=6) at raise.c:26:13
    frame #2: 0xf7deb0a0 libc.so.6`__GI_abort at abort.c:79:7
    frame #3: 0x0063436a DynamicGenerics`::RaiseFailFastException(arg1=0x00000000, arg2=0x00000000, arg3=1) at PalRedhawkUnix.cpp:90:5
    frame #4: 0x005d53a8 DynamicGenerics`PalRaiseFailFastException(arg1=0x00000000, arg2=0x00000000, arg3=1) at PalRedhawkFunctions.h:120:5
    frame #5: 0x005d5366 DynamicGenerics`Assert(expr=0x00560948, file=0x00563f83, line_num=1500, message=0x00000000) at rhassert.cpp:32:9
    frame #6: 0x005de012 DynamicGenerics`StackFrameIterator::NextInternal(this=0xf11fe490) at StackFrameIterator.cpp:1500:9
    frame #7: 0x005ddb18 DynamicGenerics`StackFrameIterator::Next(this=0xf11fe490) at StackFrameIterator.cpp:1300:5
    frame #8: 0x005e2da2 DynamicGenerics`Thread::GcScanRootsWorker(this=0xef3fe890, pfnEnumCallback=(DynamicGenerics`WKS::GCHeap::Promote(Object**, ScanContext*, unsigned int) + 1 at gc.cpp:48747), pvCallbackData=0xf11fe7ac, frameIterator=0xf11fe490)(Object**, ScanContext*, unsigned int), ScanContext*, StackFrameIterator&) at thread.cpp:554:27
    frame #9: 0x005e2b28 DynamicGenerics`Thread::GcScanRoots(this=0xef3fe890, pfnEnumCallback=(DynamicGenerics`WKS::GCHeap::Promote(Object**, ScanContext*, unsigned int) + 1 at gc.cpp:48747), pvCallbackData=0xf11fe7ac)(Object**, ScanContext*, unsigned int), ScanContext*) at thread.cpp:413:5
    frame #10: 0x005d7ee6 DynamicGenerics`GCToEEInterface::GcScanRoots(fn=(DynamicGenerics`WKS::GCHeap::Promote(Object**, ScanContext*, unsigned int) + 1 at gc.cpp:48747), condemned=2, max_gen=2, sc=0xf11fe7ac)(Object**, ScanContext*, unsigned int), int, int, ScanContext*) at gcenv.ee.cpp:122:22
    frame #11: 0x0062d326 DynamicGenerics`GCScan::GcScanRoots(fn=(DynamicGenerics`WKS::GCHeap::Promote(Object**, ScanContext*, unsigned int) + 1 at gc.cpp:48747), condemned=2, max_gen=2, sc=0xf11fe7ac)(Object**, ScanContext*, unsigned int), int, int, ScanContext*) at gcscan.cpp:152:5
    frame #12: 0x00602218 DynamicGenerics`WKS::gc_heap::mark_phase(condemned_gen_number=2) at gc.cpp:29214:9
    frame #13: 0x005ff22a DynamicGenerics`WKS::gc_heap::gc1() at gc.cpp:22180:13
    frame #14: 0x0060a398 DynamicGenerics`WKS::gc_heap::garbage_collect(n=2) at gc.cpp:24187:9
    frame #15: 0x005f8cd2 DynamicGenerics`WKS::GCHeap::GarbageCollectGeneration(this=0x008dd5d8, gen=2, reason=reason_induced) at gc.cpp:50291:9
    frame #16: 0x006269a6 DynamicGenerics`WKS::GCHeap::GarbageCollectTry(this=0x008dd5d8, generation=2, low_memory_p=NO, mode=2) at gc.cpp:49514:12
    frame #17: 0x0062688a DynamicGenerics`WKS::GCHeap::GarbageCollect(this=0x008dd5d8, generation=2, low_memory_p=false, mode=2) at gc.cpp:49444:30
    frame #18: 0x005d6c44 DynamicGenerics`::RhpCollect(uGeneration=4294967295, uMode=2, lowMemoryP=0) at GCHelpers.cpp:108:35
    frame #19: 0x006fbafc DynamicGenerics`System.Runtime.InternalCalls__RhCollect(generation=<unavailable>, mode=<unavailable>, lowMemoryP=<unavailable>) at InternalCalls.cs:65
    frame #20: 0x0066ac7a DynamicGenerics`DynamicGenerics_ThreadLocalStatics_TLSTesting___c__DisplayClass3_0___MultiThreaded_Test_b__0(this=0xf4e325bc) at threadstatics.cs:464
    frame #21: 0x006e21fc DynamicGenerics`System.Threading.ExecutionContext__RunFromThreadPoolDispatchLoop(threadPoolThread=0xf4e331a4, executionContext=<unavailable>, callback=<unavailable>, state=<unavailable>) at ExecutionContext.cs:264
    frame #22: 0x006e782e DynamicGenerics`System.Threading.Tasks.Task__ExecuteWithThreadLocal(this=0xf4e32660, currentTaskSlot=0xf4e336c4, threadPoolThread=<unavailable>) at Task.cs:2345
    frame #23: 0x006e4a30 DynamicGenerics`System.Threading.ThreadPoolWorkQueue__Dispatch at ThreadPoolWorkQueue.cs:913
    frame #24: 0x007299cc DynamicGenerics`System.Threading.PortableThreadPool_WorkerThread__WorkerThreadStart at PortableThreadPool.WorkerThread.NonBrowser.cs:102
    frame #25: 0x006e0aaa DynamicGenerics`System.Threading.Thread__StartThread(parameter=<unavailable>) at Thread.NativeAot.cs:448
    frame #26: 0x006e0e90 DynamicGenerics`System.Threading.Thread__ThreadEntryPoint(parameter=<unavailable>) at Thread.NativeAot.Unix.cs:114
    frame #27: 0xf7e478e0 libc.so.6`start_thread(arg=0xf11ff380) at pthread_create.c:442:8
    frame #28: 0xf7ec6a1c libc.so.6 at clone.S:74

The tests pass with DOTNET_gcConservative=1.

@NCLnclNCL

This comment was marked as off-topic.

@agocke agocke added this to the Future milestone Feb 1, 2024
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Feb 1, 2024
@filipnavara
Copy link
Member Author

filipnavara commented Feb 2, 2024

#97863 fixes the unwinding issue in Release builds above. The test still crashes in pure Release configuration though. It passes when the Release DynamicGenerics.o is linked against Debug libRuntime.WorkstationGC.a. I suspect there's still some lurking bug with clearing the Thumb bit in optimized clang code.

Stack trace:

* thread #1, name = 'DynamicGenerics', stop reason = signal SIGSEGV
    frame #0: 0x0086d6fa DynamicGenerics`WKS::GCHeap::Promote(ppObject=0x00000000, sc=<unavailable>, flags=0) at gc.cpp:48753:28 [opt]
  * frame #1: 0x008887de DynamicGenerics`GcInfoDecoder::EnumerateLiveSlots(REGDISPLAY*, bool, unsigned int, void (*)(void*, void**, unsigned int), void*) [inlined] GcInfoDecoder::ReportSlotToGC(this=0xef1fd770, slotDecoder=0xef1fd418, slotIndex=10, pRD=0xef1fd878, reportScratchSlots=true, inputFlags=1, pCallBack=<unavailable>, hCallBack=<unavailable>)(void*, void**, unsigned int), void*) at gcinfodecoder.cpp:0 [opt]
    frame #2: 0x008887be DynamicGenerics`GcInfoDecoder::EnumerateLiveSlots(this=0xef1fd770, pRD=0xef1fd878, reportScratchSlots=true, inputFlags=1, pCallBack=(DynamicGenerics`EnumGcRefsCallback(void*, void**, unsigned int) + 1 at GcEnum.cpp:119), hCallBack=0xef1fd7f0)(void*, void**, unsigned int), void*) at gcinfodecoder.cpp:1020:21 [opt]
    frame #3: 0x0088a2be DynamicGenerics`UnixNativeCodeManager::EnumGcRefs(this=<unavailable>, pMethodInfo=<unavailable>, safePointAddress=<unavailable>, pRegisterSet=<unavailable>, hCallback=0xef1fd7f0, isActiveStackFrame=true) at UnixNativeCodeManager.cpp:242:18 [opt]
    frame #4: 0x00853890 DynamicGenerics`EnumGcRefs(pCodeManager=<unavailable>, pMethodInfo=<unavailable>, safePointAddress=<unavailable>, pRegisterSet=<unavailable>, pfnEnumCallback=(DynamicGenerics`WKS::GCHeap::Promote(Object**, ScanContext*, unsigned int) + 1 at gc.cpp:48747), pvCallbackData=0xef1fda30, isActiveStackFrame=<unavailable>)(Object**, ScanContext*, unsigned int), ScanContext*, bool) at GcEnum.cpp:139:19 [opt]
    frame #5: 0x00857280 DynamicGenerics`Thread::GcScanRootsWorker(this=0xf05ff890, pfnEnumCallback=(DynamicGenerics`WKS::GCHeap::Promote(Object**, ScanContext*, unsigned int) + 1 at gc.cpp:48747), pvCallbackData=0xef1fda30, frameIterator=0xef1fd868)(Object**, ScanContext*, unsigned int), ScanContext*, StackFrameIterator&) at thread.cpp:523:17 [opt]
    frame #6: 0x0085706e DynamicGenerics`Thread::GcScanRoots(this=0xf05ff890, pfnEnumCallback=(DynamicGenerics`WKS::GCHeap::Promote(Object**, ScanContext*, unsigned int) + 1 at gc.cpp:48747), pvCallbackData=0xef1fda30)(Object**, ScanContext*, unsigned int), ScanContext*) at thread.cpp:413:5 [opt]
    frame #7: 0x008530ca DynamicGenerics`GCToEEInterface::GcScanRoots(fn=(DynamicGenerics`WKS::GCHeap::Promote(Object**, ScanContext*, unsigned int) + 1 at gc.cpp:48747), condemned=<unavailable>, max_gen=<unavailable>, sc=0xef1fda30)(Object**, ScanContext*, unsigned int), int, int, ScanContext*) at gcenv.ee.cpp:122:22 [opt]
    frame #8: 0x008660cc DynamicGenerics`WKS::gc_heap::mark_phase(condemned_gen_number=2) at gc.cpp:29214:9 [opt]
    frame #9: 0x00863d96 DynamicGenerics`WKS::gc_heap::gc1() at gc.cpp:22180:13 [opt]
    frame #10: 0x0086b18c DynamicGenerics`WKS::gc_heap::garbage_collect(n=<unavailable>) at gc.cpp:0 [opt]
    frame #11: 0x0086088c DynamicGenerics`WKS::GCHeap::GarbageCollectGeneration(this=<unavailable>, gen=2, reason=reason_induced) at gc.cpp:50291:9 [opt]
    frame #12: 0x0087a754 DynamicGenerics`WKS::GCHeap::GarbageCollect(int, bool, int) [inlined] WKS::GCHeap::GarbageCollectTry(this=<unavailable>, generation=<unavailable>, low_memory_p=<unavailable>, mode=<unavailable>) at gc.cpp:49514:12 [opt]
    frame #13: 0x0087a74c DynamicGenerics`WKS::GCHeap::GarbageCollect(this=<unavailable>, generation=2, low_memory_p=<unavailable>, mode=<unavailable>) at gc.cpp:49444:30 [opt]
    frame #14: 0x00852780 DynamicGenerics`::RhpCollect(uGeneration=<unavailable>, uMode=<unavailable>, lowMemoryP=<unavailable>) at GCHelpers.cpp:108:35 [opt]
    frame #15: 0x00942a3c DynamicGenerics`System.Runtime.InternalCalls__RhCollect(generation=<unavailable>, mode=<unavailable>, lowMemoryP=<unavailable>) at InternalCalls.cs:65
    frame #16: 0x008b1b80 DynamicGenerics`DynamicGenerics_ThreadLocalStatics_TLSTesting___c__DisplayClass3_0___MultiThreaded_Test_b__0(this=0xf4c41abc) at threadstatics.cs:466
    frame #17: 0x0092916c DynamicGenerics`System.Threading.ExecutionContext__RunFromThreadPoolDispatchLoop(threadPoolThread=0xf4c43830, executionContext=<unavailable>, callback=<unavailable>, state=<unavailable>) at ExecutionContext.cs:264
    frame #18: 0x0092e79e DynamicGenerics`System.Threading.Tasks.Task__ExecuteWithThreadLocal(this=0xf4c42934, currentTaskSlot=0xf4c43b20, threadPoolThread=<unavailable>) at Task.cs:2345
    frame #19: 0x0092b9a0 DynamicGenerics`System.Threading.ThreadPoolWorkQueue__Dispatch at ThreadPoolWorkQueue.cs:913
    frame #20: 0x0097075c DynamicGenerics`System.Threading.PortableThreadPool_WorkerThread__WorkerThreadStart at PortableThreadPool.WorkerThread.NonBrowser.cs:102
    frame #21: 0x00927a1a DynamicGenerics`System.Threading.Thread__StartThread(parameter=<unavailable>) at Thread.NativeAot.cs:448
    frame #22: 0x00927e00 DynamicGenerics`System.Threading.Thread__ThreadEntryPoint(parameter=<unavailable>) at Thread.NativeAot.Unix.cs:114
    frame #23: 0xf7c578e0 libc.so.6`start_thread(arg=0xef1fe380) at pthread_create.c:442:8
    frame #24: 0xf7cd6a1c libc.so.6 at clone.S:74

  thread #7, stop reason = signal 0
    frame #0: 0xf7c53cc8 libc.so.6`__futex_abstimed_wait_common at futex-internal.c:40:12
    frame #1: 0xf7c53cac libc.so.6`__futex_abstimed_wait_common(futex_word=0x00d68bb0, expected=0, clockid=<unavailable>, abstime=<unavailable>, private=0, cancel=true) at futex-internal.c:99:11
    frame #2: 0xf7c53e20 libc.so.6`__GI___futex_abstimed_wait_cancelable64(futex_word=<unavailable>, expected=<unavailable>, clockid=<unavailable>, abstime=<unavailable>, private=0) at futex-internal.c:139:10
    frame #3: 0xf7c56eb8 libc.so.6`___pthread_cond_wait at pthread_cond_wait.c:503:10
    frame #4: 0xf7c56d84 libc.so.6`___pthread_cond_wait(cond=0x00d68b88, mutex=0x00d68bb0) at pthread_cond_wait.c:618:10
    frame #5: 0x008855ba DynamicGenerics`GCEvent::Impl::Wait(this=0x00d68b88, milliseconds=<unavailable>, alertable=<unavailable>) at events.cpp:149:22 [opt]
    frame #6: 0x00857498 DynamicGenerics`Thread::InlineSuspend(UNIX_CONTEXT*) [inlined] Thread::WaitForGC(this=0xf05ff890, pTransitionFrame=<unavailable>) at thread.cpp:84:39 [opt]
    frame #7: 0x0085746a DynamicGenerics`Thread::InlineSuspend(this=0xf05ff890, interruptedContext=<unavailable>) at thread.cpp:878:5 [opt]
    frame #8: 0x00883c5a DynamicGenerics`ActivationHandler(code=34, siginfo=0xf05fe690, context=0xf05fe710) at PalRedhawkUnix.cpp:1008:9 [opt]
    frame #9: 0xf7c13280 libc.so.6 at sigrestorer.S:77
    frame #10: 0x009b2c56 DynamicGenerics`System.Collections.Concurrent.ConcurrentUnifierW`2_Container<System.Reflection.Runtime.TypeInfos.NativeFormat.NativeFormatRuntimeNamedTypeInfo_UnificationKey__System___Canon>__TryGetValue(this=<unavailable>, key=<unavailable>, hashCode=<unavailable>, value=0xf05fea34) at ConcurrentUnifierW.cs:185
    frame #11: 0x009b2aa4 DynamicGenerics`System.Collections.Concurrent.ConcurrentUnifierW`2<System.Reflection.Runtime.TypeInfos.NativeFormat.NativeFormatRuntimeNamedTypeInfo_UnificationKey__System___Canon>__GetOrAdd(this=0xf4be7090, key=System.Reflection.Runtime.TypeInfos.NativeFormat.NativeFormatRuntimeNamedTypeInfo_UnificationKey @ 0xf05fea5c) at ConcurrentUnifierW.cs:119
    frame #12: 0x0095c5d2 DynamicGenerics`System.Reflection.Runtime.TypeInfos.NativeFormat.NativeFormatRuntimeNamedTypeInfo__GetRuntimeNamedTypeInfo(metadataReader=<unavailable>, typeDefHandle=<unavailable>, precomputedTypeHandle=<unavailable>) at TypeUnifier.NativeFormat.cs:79
    frame #13: 0x00954a9c DynamicGenerics`System.Reflection.Runtime.General.TypeResolver__TryResolve_0(typeDefRefOrSpec=<unavailable>, reader=<unavailable>, typeContext=<unavailable>, exception=<unavailable>) at TypeResolver.NativeFormat.cs:34
    frame #14: 0x00954a48 DynamicGenerics`System.Reflection.Runtime.General.TypeResolver__Resolve_1(typeDefRefOrSpec=<unavailable>, reader=<unavailable>, typeContext=<unavailable>) at TypeResolver.NativeFormat.cs:24
    frame #15: 0x0095592e DynamicGenerics`System.Reflection.Runtime.FieldInfos.NativeFormat.NativeFormatRuntimeFieldInfo__get_FieldRuntimeType(this=<unavailable>) at NativeFormatRuntimeFieldInfo.cs:152
    frame #16: 0x0095540a DynamicGenerics`System.Reflection.Runtime.FieldInfos.RuntimeFieldInfo__get_FieldType(this=0xf4c4c70c) at RuntimeFieldInfo.cs:88
    frame #17: 0x009558d8 DynamicGenerics`System.Reflection.Runtime.FieldInfos.NativeFormat.NativeFormatRuntimeFieldInfo__TryGetFieldAccessor(this=0xf4c4c70c) at NativeFormatRuntimeFieldInfo.cs:144
    frame #18: 0x009555ae DynamicGenerics`System.Reflection.Runtime.FieldInfos.RuntimeFieldInfo__get_FieldAccessor(this=0xf4c4c70c) at RuntimeFieldInfo.cs:214
    frame #19: 0x0095543e DynamicGenerics`System.Reflection.Runtime.FieldInfos.RuntimeFieldInfo__GetValue(this=<unavailable>, obj=0x00000000) at RuntimeFieldInfo.cs:102
    frame #20: 0x008a3b10 DynamicGenerics`DynamicGenerics_ThreadLocalStatics_TLSTesting__MakeType1(typeArg=<unavailable>, checkInitialization=<unavailable>) at threadstatics.cs:310
    frame #21: 0x008b1b5e DynamicGenerics`DynamicGenerics_ThreadLocalStatics_TLSTesting___c__DisplayClass3_0___MultiThreaded_Test_b__0(this=0xf4c41abc) at threadstatics.cs:463
    frame #22: 0x0092916c DynamicGenerics`System.Threading.ExecutionContext__RunFromThreadPoolDispatchLoop(threadPoolThread=0xf4c42f90, executionContext=<unavailable>, callback=<unavailable>, state=<unavailable>) at ExecutionContext.cs:264
    frame #23: 0x0092e79e DynamicGenerics`System.Threading.Tasks.Task__ExecuteWithThreadLocal(this=0xf4c428ec, currentTaskSlot=0xf4c43660, threadPoolThread=<unavailable>) at Task.cs:2345
    frame #24: 0x0092b9a0 DynamicGenerics`System.Threading.ThreadPoolWorkQueue__Dispatch at ThreadPoolWorkQueue.cs:913
    frame #25: 0x0097075c DynamicGenerics`System.Threading.PortableThreadPool_WorkerThread__WorkerThreadStart at PortableThreadPool.WorkerThread.NonBrowser.cs:102
    frame #26: 0x00927a1a DynamicGenerics`System.Threading.Thread__StartThread(parameter=<unavailable>) at Thread.NativeAot.cs:448
    frame #27: 0x00927e00 DynamicGenerics`System.Threading.Thread__ThreadEntryPoint(parameter=<unavailable>) at Thread.NativeAot.Unix.cs:114
    frame #28: 0xf7c578e0 libc.so.6`start_thread(arg=0xf05ff380) at pthread_create.c:442:8
    frame #29: 0xf7cd6a1c libc.so.6 at clone.S:74

@filipnavara
Copy link
Member Author

filipnavara commented Feb 2, 2024

So, for the last crash in GC suspension I may need some help with verifying some assumptions. I can easily reproduce it and it's happening at the same point in the same function:

  • One thread gets interrupted at System.Collections.Concurrent.ConcurrentUnifierW'2_Container<System.Reflection.Runtime.TypeInfos.NativeFormat.NativeFormatRuntimeNamedTypeInfo_UnificationKey__System___Canon>__TryGetValue + 86.
  • GC decoding tells that the location is a safepoint (which may or may not be true).
  • At that point in code the scratch register R12 is used (0xd92742 <+80>: ldr.w r12, [lr, #0xc]). Are scratch registers supposed to be used at safepoints? If yes, is there any limit on which ones? We currently don't pass them to the stack frame iterator from the context. Should we?

Decoding the GCInfo on the GC thread indeed shows that there's a live variable in register R12, and since it's a scratch register, regDisplay->pR12 == NULL, which in turn causes the crash.

(cc @VSadov)

@jkotas
Copy link
Member

jkotas commented Feb 2, 2024

Safe points should be only created for call returns. Scratch registers cannot be live at call returns. It is why they are not handled for safe point. (@VSadov is changing some of these invariants in #95565.)

What does the code around the safe point look like? It may be useful to generate JIT dump for the method in question to see why the JIT decided to emit the safe point at this spot.

@filipnavara
Copy link
Member Author

filipnavara commented Feb 2, 2024

What does the code around the safe point look like? It may be useful to generate JIT dump for the method in question to see why the JIT decided to emit the safe point at this spot.

It's this code (the offsets are one-off, ie. +85 is really +86):

    0xd92730 <+63>: bhs    0x3227d2                  ; <+225> at ConcurrentUnifierW.cs:199
    0xd92732 <+65>: mov.w  lr, #0x18
    0xd92736 <+69>: mul    lr, r3, lr
    0xd9273a <+73>: add.w  lr, lr, #0x8
    0xd9273e <+77>: add    r1, lr
    0xd92740 <+79>: mov    lr, r1
    0xd92742 <+81>: ldr.w  r12, [lr, #0xc]
-- SAFEPOINT HERE? --
    0xd92746 <+85>: ldr.w  lr, [lr, #0x10]
    0xd9274a <+89>: ldr    r4, [r0, #0x4]
    0xd9274c <+91>: cmp    r4, lr
    0xd9274e <+93>: bne    0x32275a                  ; <+105> at ConcurrentUnifierW.cs:195
    0xd92750 <+95>:  ldr    r0, [r0]
    0xd92752 <+97>:  ldrsb.w lr, [r0]
    0xd92756 <+101>: cmp    r0, r12
    0xd92758 <+103>: beq    0x322770                  ; <+127> at ConcurrentUnifierW.cs:199
    0xd9275a <+105>: ldr    r3, [r1, #0x8]
    0xd9275c <+107>: cmp.w  r3, #0xffffffff
    0xd92760 <+111>: bne    0x322726                  ; <+53> at ConcurrentUnifierW.cs:185
    0xd92762 <+113>: movs   r0, #0x0
    0xd92764 <+115>: ldr    r4, [sp, #0x24]
    0xd92766 <+117>: str    r0, [r4]
    0xd92768 <+119>: pop.w  {r4, r5, r6, r11, lr}
    0xd9276c <+123>: add    sp, #0xc

My suspicion is that the IsSafepoint answer is already wrong. I can dump the GCInfo to verify what exactly is in there but I wanted to be sure that's the right direction to look into.

@VSadov
Copy link
Member

VSadov commented Feb 2, 2024

One thread gets interrupted at

Right now threads can only be interrupted in interruptible code. Volatile registers can contain GC refs there.

Also threads can self-interrupt when hitting a hijacked return.Volatile regs are dead, but return registers may contain live GC refs.

After that it is unwinding through return sites when returns did not happen yet. Volatile registers are dead.

Since there are no calls around the interruption location in you sample, it is must be in fully interruptible method.

@filipnavara
Copy link
Member Author

The crash happens only when the C runtime part is built with optimizations, so most likely there's an issue with decoding the GC info (the compiler is very eager to optimize out alignments when wrong pointer type is used). I'll dump the GC info.

Since there are no calls around the interruption location in you sample, it is must be in fully interruptible method.

I really hope it's just misdecoded GC info... because a fully interruptible method should not use the R12 register (or we would need to save it from the frame which is trivial).

@VSadov
Copy link
Member

VSadov commented Feb 2, 2024

The part that C optimizations matter is suspicious indeed.

For the r12 register i do not recall if its use for scratch is forbidden (can’t easily check that right now).
Typically a register set for a leaf frame includes scratch registers, since leaf frames report them to gc.
If we are not on a call return, we are probably in a leaf.

@jkotas
Copy link
Member

jkotas commented Feb 2, 2024

a fully interruptible method should not use the R12 register (or we would need to save it from the frame which is trivial).

It is fine for fully interruptible methods to use R12 register to store GC reference. I think it is a bug that it is not initialized in the REGDISPLAY in StackFrameIterator::InternalInit(Thread * pThreadToWalk, NATIVE_CONTEXT* pCtx, uint32_t dwFlags).

@filipnavara
Copy link
Member Author

Thanks. I came to the same conclusion. The GCInfo shows that it's fully interruptible method.

I'll send a PR.

@sonatique
Copy link

sonatique commented Apr 16, 2024

@am11 : thank you very much for what you provided. I think this will be quite helpful. I am now trying using WSL, docker and images provided by https://mcr.microsoft.com/en-us/product/dotnet/nightly/sdk/tags .

I am currently testing with building the default console program natively (no AOT yet), expecting linux-x64 using mcr.microsoft.com/dotnet/nightly/sdk:9.0-preview-jammy-aot but I keep getting
"
Unable to find package Microsoft.NET.ILLink.Tasks with version (>= 9.0.0-preview.4.24211.4)
- Found 21 version(s) in nuget.org [ Nearest version: 9.0.0-preview.3.24172.9 ]
"
as soon as I enable PublishTrimmed.

It seems there is something wrong with nuget configuration in the docker image that MS provides, or (more probably) I am missing something.

I will try a little bit more and then I think I'll turn to your solution, thanks a lot for providing it!

EDIT:
creating the .nuget folders and file and filling it with the content I can see in your dockerfile, and as indicated here: https://github.com/dotnet/runtime/blob/main/docs/project/dogfooding.md#install-prerequisites makes everything works.

Why there is not even a .nuget folder in the MS image is beyond me.

@sonatique
Copy link

sonatique commented Apr 16, 2024

Hello @am11 ,
I tried my best with you dockerfile but it always fails during docker build when trying to execute build-roofs, with

409.4 Building dependency tree...
410.0 E: Unable to locate package liblldb-3.9-dev
410.0 E: Couldn't find any package by glob 'liblldb-3.9-dev'
410.0 E: Couldn't find any package by regex 'liblldb-3.9-dev'

Using "llvm" instead of "llvm14" at bash /dev/stdin arm bionic llvm14 let the docker build finish..

However, when doing:
dotnet9 publish -r linux-arm -v diag on a previously created default console project to which I just modified the PublishAot property to be true, I get
/usr/bin/ld.bfd: unrecognised emulation mode: armelf_linux_eabi

If fact this is exactly the same error that I got when trying with mcr.microsoft.com/dotnet/nightly/sdk:9.0-preview-jammy-aot + adding the .nget config file.

So probably that llvm14 is required but I cannot manage to pass this failure with liblldb-3.9-dev...

@am11
Copy link
Member

am11 commented Apr 16, 2024

Hey @sonatique, sorry for the confusion. I was testing these steps in a container and stitched together as a Dockefile, should had tested the final version. 😅

I've now updated the docker and executed all steps in WSL, lets give it another try!

Changes:

  • install lld in base layer: lld doesn't require separate cross package; it's multi-targetting. Specify -p:LinkerFlavor=lld in publish command.
    • it was either that or apt install binutils-arm. both are ok options (binuitls/bfd is slightly better at size optimization).
  • switch to ubuntu jammy (v22.04) which has llvm15 which bionic (20.04) doesn't. It was either this or use llvm10 lldb10 with bionic.
    • the benefit of using "old" distro is that it maximizes binary compatibility; rule is: older distro means older glibc, and the old the libc, the greater the compat (if you wanted to copy the binary on some device or distribute it for public e.g.).
      • Note: this only applies to libc version, the compiler toolchain rule is the opposite, the latest the better.
    • the default selection in build-rootfs.sh is 3.9 which is very old and not recommended for AOT. you can check the latest version of available toolchain using docker run --rm --platform linux/arm/v7 ubuntu:jammy sh -c 'apt update && apt search llvm' with codename of your choice (bionic etc. or versions as label ubuntu:22.04).
  • install llvm in base layer: we need tools like llvm-objcopy. again, it was either that or install binuilts.

@sonatique
Copy link

@am11 : trying now. Thanks a lot!

@sonatique
Copy link

sonatique commented Apr 16, 2024

@am11 . I am still getting the same issue. I must be doing something wrong.
Here is what I get after I docker built with your newer dockerfile (I just commented the last RUN line). Everything went smooth up to the dotnet publish command:

/root/.dotnet9/dotnet publish -r linux-arm -v diag
/root/.dotnet9/sdk/9.0.100-preview.4.24215.10/MSBuild.dll -nologo --property:_IsPublishing=true -property:RuntimeIdentifier=linux-arm -property:_CommandLineDefinedRuntimeIdentifier=true -property:Configuration=Release -distributedlogger:Microsoft.DotNet.Tools.MSBuild.MSBuildLogger,/root/.dotnet9/sdk/9.0.100-preview.4.24215.10/dotnet.dll*Microsoft.DotNet.Tools.MSBuild.MSBuildForwardingLogger,/root/.dotnet9/sdk/9.0.100-preview.4.24215.10/dotnet.dll -maxcpucount -restore -target:Publish -tlp:default=auto -verbosity:m -verbosity:diag ./test-console-1.csproj
Restore complete (0.9s)
    Determining projects to restore...
    All projects are up-to-date for restore.
You are using a preview version of .NET. See: https://aka.ms/dotnet-support-policy
  test-console-1 failed with 2 error(s) (1.5s) → bin/Release/net9.0/linux-arm/test-console-1.dll
    /usr/bin/ld.bfd: unrecognised emulation mode: armelf_linux_eabi
    Supported emulations: elf_x86_64 elf32_x86_64 elf_i386 elf_iamcu elf_l1om elf_k1om i386pep i386pe
    clang : error : linker command failed with exit code 1 (use -v to see invocation)

any idea by chance? Note that I am on a x64 machine. Since Windows ARM64 is not very common I didn't mention it earlier, but I see no arm* "supported emulations" in the list, I wonder why.

In case you wonder, here is my csproj file:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net9.0</TargetFramework>
    <RootNamespace>test_console_1</RootNamespace>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
	<PublishTrimmed>true</PublishTrimmed>
	<PublishSingleFile>true</PublishSingleFile>
	<InvariantGlobalization>true</InvariantGlobalization>
	<PublishAot>true</PublishAot>
  </PropertyGroup>

</Project>

and I just have the default program.cs generated by dotnet new console

Thanks in advance!

@am11
Copy link
Member

am11 commented Apr 16, 2024

Here is what I get after I docker built with your newer dockerfile

First try the exact Dockerfile, if that works (which it does in two machines i've tested on) then customize to your project.

@sonatique
Copy link

sonatique commented Apr 16, 2024

@am11 : well obviously I should have started here, because, as you expected, everything complete without error with your full dockerfile. I was a bit bold thinking I could directly do what I wanted.
Thanks a lot, you have been invaluably useful. I will now try to customize to suit my needs.

EDIT: I have been able to achieve what I wanted, super great, thanks!

@sonatique
Copy link

Thanks again @am11 !

Now that I better understand what I am doing and that I have a working setup, I am curious about one of the things you wrote earlier. You said you "installed lld, it was either that or apt install binutils-arm" and later you wrote the same about llvm.

So you appear to have made efforts to avoid "binutils-arm", though you wrote "binuitls/bfd is slightly better at size optimization".

So I am wondering: what would be wrong in "simply" using binutils? I have to admit I tried for some time to install binutils and remove or replace -p:LinkerFlavor=lld and -p:ObjCopy=llvm-objcopy but to no avail, so it's probably not trivial (at least to me).

Do you have any pointer regarding how to use binutils? I wish I could compare binary produced by both lld/llvm and binutils.

Thanks in advance if you have time for this low priority question.

@am11
Copy link
Member

am11 commented Apr 17, 2024

gcc toolchain (gcc, binutils etc.) are architecture-specific, while llvm toolchain is multiarch. Meaning to cross-compile stuff, lld from llvm toolchain of host architecture will do the job, while gcc requires target arch specific package, e.g. apt install -y binutils-arm-linux-gnueabi.

See the previous attempt of using gcc toolchian in crosscompile: #78559, the conclusion was it's best to stick with llvm toolchain for cross-compilation. The difference in size is a few KBs and it's not meaningful in grand scheme of things.

@sonatique
Copy link

Hi @am11 , OK I see, thanks a lot!

@tunger
Copy link

tunger commented Apr 18, 2024

Thanks a lot for the Dockerfile, @am11, and for asking the questions, @sonatique!

With the provided Dockerfile, we can utilize vscode dev containers to develop net9.0 apps in the IDE and compile it AOT for linux-arm. It works great!

Awesome work on the native AOT for linux-arm. It speeds up my application a lot.

Set it up like this:

.devcontainer/devcontainer.json

{
  "build": { "dockerfile": "Dockerfile" },

  "customizations": {
    "vscode": {
      "extensions": ["ms-dotnettools.csdevkit"]
    }
  }
}

.devcontainer/Dockerfile

FROM --platform=$BUILDPLATFORM ubuntu:latest AS builder

RUN apt update && apt install -y clang debootstrap curl lld llvm

RUN mkdir /dev/arm; \
  curl -sSL https://raw.githubusercontent.com/dotnet/arcade/main/eng/common/cross/arm/sources.list.jammy -o /dev/arm/sources.list.jammy; \
  curl -sSL https://raw.githubusercontent.com/dotnet/arcade/main/eng/common/cross/build-rootfs.sh |\
    bash /dev/stdin arm jammy llvm15 lldb15
	
RUN mkdir -p "$HOME/.dotnet" "$HOME/.nuget/NuGet";
RUN curl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin --quality daily --channel 9.0;
RUN cat > "$HOME/.nuget/NuGet/NuGet.Config" <<EOF
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<packageSources>
    <add key="nuget.org" value="https://api.nuget.org/v3/index.json" protocolVersion="3" />
    <add key="dotnet9" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet9/nuget/v3/index.json" />
</packageSources>
EOF

ENV DOTNET_NOLOGO=1
ENV PATH "$PATH:/root/.dotnet"

Then you can use vscode tasks to publish the app, like this:

.vscode/tasks.json

{
  "version": "2.0.0",
  "tasks": [
    {
      "command": "dotnet",
      "args": [
        "publish",
        "application.csproj",
        "-r",
        "linux-arm",
        "-c",
        "Release",
        "--self-contained",
        "true",
        "-o",
        "${workspaceFolder}/out",
        "-p:PublishSingleFile=false",
        "-p:EnableCompressionInSingleFile=true",
        "-p:PublishAot=true",
        "-p:LinkerFlavor=lld",
        "-p:ObjCopy=llvm-objcopy",
        "-p:SysRoot=\"/.tools/rootfs/arm\""
      ],
      "options": {
        "cwd": "${workspaceFolder}"
      },
      "group": "build",
      "label": "dotnet9 publish release AOT"
    }
  ]
}

@am11
Copy link
Member

am11 commented Apr 18, 2024

@tunger, very nice! Thanks for sharing. After #101213 is merged, we can use the same mechanism for linux-musl-arm targeting Alpine Linux. Basically, in the Dockerfile:

- bash /dev/stdin arm jammy llvm15 lldb15
+ bash /dev/stdin arm alpine llvm15 lldb15

and in tasks.json:

        "-r",
-        "linux-arm",
+        "linux-musl-arm",

Once .NET 9 is shipped, we would be able to use the prebuilt official docker images which will exempt setting up cross environment.

One reason of recommending official images is slightly(⚓) important because the more people start using this kind of experimental solution, the more confusion it's going to cause. For instance; this dockerfile (for linux-arm and not for linux-musl-arm) requires 'nested virtualization' support for chroot (fakechroot has some issues so fakeroot fakechroot debootstrap ... -variant=fakechroot also ends up requiring this the real chroot.. which - according to google results - is a known issue actively being investigated in Debian world). Nested virtualization needs to "at least" EL0 (aka Non-hardware Assisted Nested Virtualization; the kind which is available on M1 macs, which does not support hardware assisted virtualization). Alpine's crossbuild setup, OTOH, doesn't require chroot etc., and it is lightening faster than debootstrap (yet to find any package manager faster than Alpine's apk(1) 😁).

I tested this docker on a few public CI systems, here is the support situation:

  • appveyor: no EL0 support, so debootsrap throws fatal exit code at chroot step
  • cirrus-ci: same as appveyor
  • GitHub Actions: only Linux supports it. So it's best to publish your base layer to something like Docker or GitHub Package Registry to avoid the surprises.

⚓ It is only about "building" the image, dotnet-publish does not use chroot.

@dabbinavo
Copy link

dabbinavo commented May 15, 2024

@am11, I just followed your instructions from your comment #97729 (comment) step by step, but unfortunately, docker build fails with the output below.

I tried your steps inside a Ubuntu 22.04 WSL (with docker installed) on a Windows 11 x64 host machine.

Is it possible that this file changed in the meantime?: https://raw.githubusercontent.com/dotnet/arcade/main/eng/common/cross/build-rootfs.sh

I get the same error at the exact same step if i try to run each command directly in my WSL without using docker at all.

/tmp/arm-builder$ docker build . -t armv7-nativeaot-webapi
[+] Building 26.6s (7/8)                                                                                                                     docker:default
 => [internal] load build definition from Dockerfile                                                                                                   0.2s
 => => transferring dockerfile: 1.30kB                                                                                                                 0.0s
 => [internal] load metadata for docker.io/library/ubuntu:latest                                                                                       1.0s
 => [internal] load .dockerignore                                                                                                                      0.0s
 => => transferring context: 2B                                                                                                                        0.0s
 => [1/5] FROM docker.io/library/ubuntu:latest@sha256:3f85b7caad41a95462cf5b787d8a04604c8262cdcdf9a472b8c52ef83375fe15                                 0.0s
 => CACHED [2/5] RUN apt update && apt install -y clang debootstrap curl lld llvm                                                                      0.0s
 => CACHED [3/5] RUN mkdir -p "$HOME/.dotnet9" "$HOME/.nuget/NuGet";   curl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin --quality dai  0.0s
 => ERROR [4/5] RUN mkdir /dev/arm;   curl -sSL https://raw.githubusercontent.com/dotnet/arcade/main/eng/common/cross/arm/sources.list.jammy -o /dev  25.4s
------
 > [4/5] RUN mkdir /dev/arm;   curl -sSL https://raw.githubusercontent.com/dotnet/arcade/main/eng/common/cross/arm/sources.list.jammy -o /dev/arm/sources.list.jammy;   curl -sSL https://raw.githubusercontent.com/dotnet/arcade/main/eng/common/cross/build-rootfs.sh |    bash /dev/stdin arm jammy llvm15 lldb15:
1.016 I: Retrieving InRelease
1.406 I: Checking Release signature
1.413 I: Valid Release signature (key id F6ECB3762474EDA9D21B7022871920D1991BC93C)
1.834 I: Retrieving Packages
2.035 I: Validating Packages
2.159 I: Resolving dependencies of required packages...
2.301 I: Resolving dependencies of base packages...
3.377 I: Checking component main on http://ports.ubuntu.com...
3.634 I: Retrieving adduser 3.118ubuntu5
4.366 I: Validating adduser 3.118ubuntu5
4.381 I: Retrieving apt 2.4.5
4.589 I: Validating apt 2.4.5
........
24.28 I: Extracting usrmerge...
24.29 I: Extracting util-linux...
24.33 I: Extracting zlib1g...
24.66 W: Failure trying to run: chroot "/crossrootfs/arm" /bin/true
24.66 W: See /crossrootfs/arm/debootstrap/debootstrap.log for details
------
Dockerfile:20
--------------------
  19 |
  20 | >>> RUN mkdir /dev/arm; \
  21 | >>>   curl -sSL https://raw.githubusercontent.com/dotnet/arcade/main/eng/common/cross/arm/sources.list.jammy -o /dev/arm/sources.list.jammy; \
  22 | >>>   curl -sSL https://raw.githubusercontent.com/dotnet/arcade/main/eng/common/cross/build-rootfs.sh |\
  23 | >>>     bash /dev/stdin arm jammy llvm15 lldb15
  24 |
--------------------
ERROR: failed to solve: process "/bin/sh -c mkdir /dev/arm;   curl -sSL https://raw.githubusercontent.com/dotnet/arcade/main/eng/common/cross/arm/sources.list.jammy -o /dev/arm/sources.list.jammy;   curl -sSL https://raw.githubusercontent.com/dotnet/arcade/main/eng/common/cross/build-rootfs.sh |    bash /dev/stdin arm jammy llvm15 lldb15" did not complete successfully: exit code: 1

@filipnavara
Copy link
Member Author

filipnavara commented May 15, 2024

On WSL I had to manually update the binfmts configuration to register the QEMU emulators. You may want to check update-binfmts --display and update-binfmts --enable.

@dabbinavo
Copy link

dabbinavo commented May 15, 2024

Still the same failure (docker build, as well directly inside WSL) after executing your suggested commands inside WSL

sudo update-binfmts --display
sudo update-binfmts --enable

@filipnavara
Copy link
Member Author

enbale is a typo, right? update-binfmts --display is supposed to list all the various QEMU user configurations, does it?

@dabbinavo
Copy link

Yes, it was a typo and the display switch outputs:

~$ update-binfmts --display
llvm-14-runtime.binfmt (enabled):
     package = llvm-14-runtime
        type = magic
      offset = 0
       magic = BC
        mask =
 interpreter = /usr/bin/lli-14
    detector =
python3.10 (enabled):
     package = python3.10
        type = magic
      offset = 0
       magic = \x6f\x0d\x0d\x0a
        mask =
 interpreter = /usr/bin/python3.10
    detector =

@filipnavara
Copy link
Member Author

That doesn't list any of the QEMU user packages. There would be an entry similar to:

qemu-arm (disabled):
     package = qemu-arm
        type = magic
      offset = 0
       magic = \x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x28\x00
        mask = \xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff
 interpreter = /usr/bin/qemu-arm-static
    detector =

That means the QEMU user emulators are not installed properly and thus the chroot binaries in the debootstrap process cannot execute. I don't remember anymore how I fixed this but hopefully this points in the right direction to Google and fix the problem.

@am11
Copy link
Member

am11 commented May 15, 2024

I was using this docker https://github.com/am11/CrossRepoCITesting/blob/master/linux-arm-aot/Dockerfile and this workflow https://github.com/am11/CrossRepoCITesting/blob/master/.github/workflows/docker-naot-arm32.yml. I found that running docker run --privileged --rm tonistiigi/binfmt --install all before hand (as I did in the workflow yml) made things whole lot easier. Also, using that command fixed Appveyor CI and Cirrus-CI builds.

@dabbinavo
Copy link

Okay, thanks for all the information. My specific issue from comment #97729 (comment) was resolved by doing sudo apt install qemu-user-static qemu qemu-system-arm qemu-efi inside my WSL.
For now, i am unsure, which of the packages was neccessary.

Will do more tests on Tuesday and post a final solution in the case i find one for me.

@dabbinavo
Copy link

dabbinavo commented May 21, 2024

I finally was able to compile an app for an embedded Debian 11 (bullseye) system by following these setup steps on the build machine (Azure Devops Services, Microsoft Hosted Agent, vmImage: ubuntu-20.04):

sudo apt install -y clang debootstrap curl lld llvm qemu-user-static binfmt-support
mkdir -p "$HOME/.dotnet9" "$HOME/.nuget/NuGet"
curl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin --quality daily --channel 9.0 --install-dir "$HOME/.dotnet9"
cat > "$HOME/.nuget/NuGet/NuGet.Config" <<EOF
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<packageSources>
    <add key="nuget.org" value="https://api.nuget.org/v3/index.json" protocolVersion="3" />
    <add key="dotnet9" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet9/nuget/v3/index.json" />
</packageSources>
</configuration>
EOF

export DOTNET_NOLOGO=1
export ROOTFS_DIR=/crossrootfs/arm

sudo mkdir /dev/arm
curl -sSL https://raw.githubusercontent.com/dotnet/arcade/main/eng/common/cross/arm/sources.list.focal -o /dev/arm/sources.list.focal
curl -sSL https://raw.githubusercontent.com/dotnet/arcade/main/eng/common/cross/build-rootfs.sh | sudo -E bash /dev/stdin arm focal llvm10 lldb10

Then running the cross compile build with:

/home/vsts/.dotnet9/dotnet publish /path/to/csharp-project --property:RuntimeIdentifier="linux-arm" --property:TargetFramework="net9.0" --property:PublishAot="true" --property:Configuration="Release" --property:LinkerFlavor="lld" --property:ObjCopy="llvm-objcopy" --property:SysRoot="/crossrootfs/arm" -o /desired/output/directory

As mentioned in this comment, I had to use Ubuntu 20.04 and focal rootfs with llvm10 and lldb10 in order to run it on the bullseye system with "old" glibc.

@dabbinavo
Copy link

dabbinavo commented May 21, 2024

Another question from my side: Does anyone know if there will be "official" support for natively cross-compiling to linux-arm for .NET 9 and maybe also newer versions? The question comes up, as we are currently evaluating if it will be safe to use this functionality in a consumer product.

I mean will .NET 9 be internally testet against NativeAOT support for linux-arm and will it be maintained and bug-fixed after .NET 9 is released?

Unfortunately, I could not find any announcement or anything else about this.

@am11
Copy link
Member

am11 commented May 21, 2024

As it happened, there already are tags for arm32 with .NET 9 SDK https://hub.docker.com/_/microsoft-dotnet-nightly-sdk/. 👌😎

# run a throw-away-after-use (--rm) container interactively for linux/arm/v7,
# while mounting the current-working-directory to /myapp 
$ docker run --rm --platform linux/arm/v7 -v$(pwd):/myapp -w /myapp -it \
      mcr.microsoft.com/dotnet/nightly/sdk:9.0-preview

# inside the container
$ uname -a
Linux ea9e301095ba 6.6.26-linuxkit #1 SMP Sat Apr 27 04:13:19 UTC 2024 armv7l GNU/Linux

$ dotnet --info
Linux ea9e301095ba 6.6.26-linuxkit #1 SMP Sat Apr 27 04:13:19 UTC 2024 armv7l GNU/Linux
root@ea9e301095ba:/# dotnet --info
.NET SDK:
 Version:           9.0.100-preview.4.24266.28
 Commit:            75a08fda5c
 Workload version:  9.0.100-manifests.2c9affbd
 MSBuild version:   17.11.0-preview-24225-01+bd0b1e466

Runtime Environment:
 OS Name:     debian
 OS Version:  12
 OS Platform: Linux
 RID:         linux-arm
 Base Path:   /usr/share/dotnet/sdk/9.0.100-preview.4.24266.28/

.NET workloads installed:
There are no installed workloads to display.

Host:
  Version:      9.0.0-preview.4.24260.3
  Architecture: arm
  Commit:       2270e3185f

.NET SDKs installed:
  9.0.100-preview.4.24266.28 [/usr/share/dotnet/sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 9.0.0-preview.4.24260.3 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 9.0.0-preview.4.24260.3 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

Other architectures found:
  None

Environment variables:
  Not set

global.json file:
  Not found

Learn more:
  https://aka.ms/dotnet/info

Download .NET:
  https://aka.ms/dotnet/download

On macOS arm64, I run into qemu assertion #97729 (comment).

If you are on the x64 host with docker installed, you can use one of these tags with prerequisites for cross compilation: https://github.com/dotnet/versions/blob/main/build-info/docker/image-info.dotnet-dotnet-buildtools-prereqs-docker-main.json

e.g.

$ docker run -e ROOTFS_DIR=/crossrootfs/arm --rm -it mcr.microsoft.com/dotnet-buildtools/prereqs:cbl-mariner-2.0-cross-arm
# install dotnet in it

You can also create a Dockerfile to make it ready.
e.g.

FROM mcr.microsoft.com/dotnet-buildtools/prereqs:cbl-mariner-2.0-cross-arm

# install dotnet (dotnet-script install)

# now warm up NativeAOT so ilc packages are ready to use
RUN dotnet new webapiaot -n warmupapp && dotnet publish --project warmupapp && rm -rf warmupapp

then build this image and tag it docker build . -t my-dotnet9-linux-arm-builder. Usage dotnet run --rm -v$(pwd):/myapp -w /myapp my-dotnet9-linux-arm-builder dotnet publish -c Release -o dist.

However, if you are on non-x64 machine like arm64, or you wanted the bleeding-edge daily build then you can build the builder image from scratch as discussed above.

@AustinWise
Copy link
Contributor

I tried using the cbl-mariner-2.0-cross-arm image as described above and got this error (and similar) when linking:

undefined symbol: __clock_gettime64

I assume this is caused by the change for year 2038 support. Using a newer docker image fixed this issue.

My full Dockerfile:

FROM mcr.microsoft.com/dotnet-buildtools/prereqs:azurelinux-3.0-cross-arm-net9.0

RUN curl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin --quality preview --channel 9.0 --install-dir "$HOME/.dotnet9"

RUN ln -s  /root/.dotnet9/dotnet /usr/bin/dotnet

ENV ROOTFS_DIR=/crossrootfs/arm

# now warm up NativeAOT so ilc packages are ready to use
RUN dotnet new webapiaot -n warmupapp && dotnet publish warmupapp -r linux-arm -p:LinkerFlavor=lld -p:ObjCopy=llvm-objcopy -p:SysRoot=$ROOTFS_DIR && rm -rf warmupapp

@am11
Copy link
Member

am11 commented Jul 22, 2024

Using a newer docker image fixed this issue.

Yea, use its successor azurelinux-3.0-cross-arm-{alpine-}net9.0 as CBL mariner is frozen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests