Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix riscv register mapping with llvm-libunwind #111735

Merged
merged 11 commits into from
Feb 7, 2025

Conversation

am11
Copy link
Member

@am11 am11 commented Jan 23, 2025

#106223

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jan 23, 2025
@am11 am11 added the arch-riscv Related to the RISC-V architecture label Jan 23, 2025
Copy link
Contributor

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

@am11
Copy link
Member Author

am11 commented Jan 23, 2025

cc @dotnet/samsung @filipnavara
This gets us a little bit further in hello world run. Next issue it hits is infinite recursion because of the failing assert in S_P_CoreLib_System_Runtime_CompilerServices_CastCache__CreateCastCache. Seems like something is overwriting _lastFlushSize first time it fails (second recursion step), haven't figured out yet why it fails in the first place, but typically most issues so far has ended up being some mistake in asm code. (there is no reference C impl for this stuff so it's fun to manually transpile asm by debugging the $other arch's asm stub 😅).

Comment on lines +99 to +100
addi sp, sp, \ssize
.cfi_adjust_cfa_offset -\ssize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: In PROLOG_SAVE_REG_PAIR_INDEXED in other asm files for RISC-V and LoongArch the ssize is assumed positive, why negative here?

Copy link
Member Author

@am11 am11 Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The corresponding arm64 macro has stack which is growing in the downwards direction, it is trying to match that.

PROLOG_SAVE_REG_PAIR s3, s4, 48
PROLOG_SAVE_REG_PAIR s5, s6, 64
PROLOG_SAVE_REG_PAIR s7, s8, 80
PROLOG_SAVE_REG_PAIR s9, s10, 96
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s11 not saved?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here was -0x format was getting the wrong value, so I converted to decimal like other archs. I can preserve s11 if needed, not sure though (as the breakage was fixed and I moved on to the next issue).

@am11
Copy link
Member Author

am11 commented Jan 25, 2025

Recursion has gone, next issue is #106223 (comment). This batch is ready.

Copy link
Contributor

@tomeksowi tomeksowi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My last comments concern UniversalTransition, #106223 (comment). I'm ok with addressing in next PR if you think as-is is a good save point.

@am11
Copy link
Member Author

am11 commented Jan 26, 2025

@tomeksowi, thanks. It actually moves us a little further:

Thread 1 "app1" received signal SIGSEGV, Segmentation fault.
InterfaceDispatchCell::GetDispatchCellInfo (this=0x780600607805005) at /runtime/src/coreclr/nativeaot/Runtime/inc/rhbinder.h:145
145	        UIntTarget cachePointerValue = m_pCache;
(gdb) bt
#0  InterfaceDispatchCell::GetDispatchCellInfo (this=0x780600607805005) at /runtime/src/coreclr/nativeaot/Runtime/inc/rhbinder.h:145
#1  0x0000002aaab70c80 in RhpGetDispatchCellInfo (pCell=0x780600607805005, pDispatchCellInfo=0x3fffffe918) at /runtime/src/coreclr/nativeaot/Runtime/CachedInterfaceDispatch.cpp:541
#2  0x0000002aaac737d8 in S_P_CoreLib_System_Runtime_CachedInterfaceDispatch__RhpCidResolve_Worker (pObject=..., pCell=540537534296379397) at /runtime/src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/CachedInterfaceDispatch.cs:27
#3  0x0000002aaac73790 in S_P_CoreLib_System_Runtime_CachedInterfaceDispatch__RhpCidResolve (callerTransitionBlockParam=540537534296379397, pCell=540537534296379397)
    at /runtime/src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/CachedInterfaceDispatch.cs:21
#4  0x0000002aaac1276a in RhpUniversalTransition_DebugStepTailCall () at /runtime/src/coreclr/nativeaot/Runtime/riscv64/UniversalTransition.S:178
Backtrace stopped: frame did not save the PC

The call originates from 0x0000002aaacc7838 <+264>:

(gdb) disas
Dump of assembler code for function S_P_CoreLib_System_Collections_Generic_Dictionary_2<System___Canon__System___Canon>__TryInsert:
   0x0000002aaacc7730 <+0>:	addi	sp,sp,-112
   0x0000002aaacc7734 <+4>:	sd	s0,8(sp)
   0x0000002aaacc7738 <+8>:	sd	ra,16(sp)
   0x0000002aaacc773c <+12>:	sd	s1,24(sp)
   0x0000002aaacc7740 <+16>:	sd	s2,32(sp)
   0x0000002aaacc7744 <+20>:	sd	s3,40(sp)
   0x0000002aaacc7748 <+24>:	sd	s4,48(sp)
   0x0000002aaacc774c <+28>:	sd	s5,56(sp)
   0x0000002aaacc7750 <+32>:	sd	s6,64(sp)
   0x0000002aaacc7754 <+36>:	sd	s7,72(sp)
   0x0000002aaacc7758 <+40>:	sd	s8,80(sp)
   0x0000002aaacc775c <+44>:	sd	s9,88(sp)
   0x0000002aaacc7760 <+48>:	sd	s10,96(sp)
   0x0000002aaacc7764 <+52>:	sd	s11,104(sp)
   0x0000002aaacc7768 <+56>:	addi	s0,sp,8
   0x0000002aaacc776c <+60>:	sd	a0,-8(s0)
   0x0000002aaacc7770 <+64>:	mv	s1,a0
   0x0000002aaacc7774 <+68>:	mv	s2,a1
   0x0000002aaacc7778 <+72>:	mv	s3,a2
   0x0000002aaacc777c <+76>:	sext.w	s4,a3
   0x0000002aaacc7780 <+80>:	beqz	s2,0x2aaacc7c8c <S_P_CoreLib_System_Collections_Generic_Dictionary_2<System___Canon__System___Canon>__TryInsert+1372>
   0x0000002aaacc7784 <+84>:	ld	a0,8(s1)
   0x0000002aaacc7788 <+88>:	bnez	a0,0x2aaacc77a0 <S_P_CoreLib_System_Collections_Generic_Dictionary_2<System___Canon__System___Canon>__TryInsert+112>
   0x0000002aaacc778c <+92>:	mv	a0,s1
   0x0000002aaacc7790 <+96>:	li	a1,0
   0x0000002aaacc7794 <+100>:	auipc	a2,0x0
   0x0000002aaacc7798 <+104>:	addi	a2,a2,-340 # 0x2aaacc7640 <S_P_CoreLib_System_Collections_Generic_Dictionary_2<System___Canon__System___Canon>__Initialize>
   0x0000002aaacc779c <+108>:	jalr	a2
   0x0000002aaacc77a0 <+112>:	ld	a0,8(s1)
   0x0000002aaacc77a4 <+116>:	snez	a0,a0
   0x0000002aaacc77a8 <+120>:	auipc	a1,0x54
   0x0000002aaacc77ac <+124>:	addi	a1,a1,152 # 0x2aaad1b840
   0x0000002aaacc77b0 <+128>:	auipc	a2,0x3a
   0x0000002aaacc77b4 <+132>:	addi	a2,a2,392 # 0x2aaad01938
   0x0000002aaacc77b8 <+136>:	auipc	a3,0xfffbc
   0x0000002aaacc77bc <+140>:	addi	a3,a3,-1352 # 0x2aaac83270 <S_P_CoreLib_System_Diagnostics_Debug__Assert_2>
   0x0000002aaacc77c0 <+144>:	jalr	a3
   0x0000002aaacc77c4 <+148>:	ld	s5,16(s1)
   0x0000002aaacc77c8 <+152>:	snez	a0,s5
   0x0000002aaacc77cc <+156>:	auipc	a1,0x58
   0x0000002aaacc77d0 <+160>:	addi	a1,a1,820 # 0x2aaad1fb00
   0x0000002aaacc77d4 <+164>:	auipc	a2,0x3a
   0x0000002aaacc77d8 <+168>:	addi	a2,a2,356 # 0x2aaad01938
   0x0000002aaacc77dc <+172>:	auipc	a3,0xfffbc
   0x0000002aaacc77e0 <+176>:	addi	a3,a3,-1388 # 0x2aaac83270 <S_P_CoreLib_System_Diagnostics_Debug__Assert_2>
   0x0000002aaacc77e4 <+180>:	jalr	a3
   0x0000002aaacc77e8 <+184>:	ld	s6,24(s1)
   0x0000002aaacc77ec <+188>:	beqz	s6,0x2aaacc77f8 <S_P_CoreLib_System_Collections_Generic_Dictionary_2<System___Canon__System___Canon>__TryInsert+200>
   0x0000002aaacc77f0 <+192>:	li	a0,1
   0x0000002aaacc77f4 <+196>:	j	0x2aaacc77fc <S_P_CoreLib_System_Collections_Generic_Dictionary_2<System___Canon__System___Canon>__TryInsert+204>
   0x0000002aaacc77f8 <+200>:	sext.w	a0,zero
   0x0000002aaacc77fc <+204>:	slli	a0,a0,0x38
   0x0000002aaacc7800 <+208>:	srli	a0,a0,0x38
   0x0000002aaacc7804 <+212>:	auipc	a1,0x57
   0x0000002aaacc7808 <+216>:	addi	a1,a1,-1900 # 0x2aaad1e098
   0x0000002aaacc780c <+220>:	auipc	a2,0x3a
   0x0000002aaacc7810 <+224>:	addi	a2,a2,300 # 0x2aaad01938
   0x0000002aaacc7814 <+228>:	auipc	a3,0xfffbc
   0x0000002aaacc7818 <+232>:	addi	a3,a3,-1444 # 0x2aaac83270 <S_P_CoreLib_System_Diagnostics_Debug__Assert_2>
   0x0000002aaacc781c <+236>:	jalr	a3
   0x0000002aaacc7820 <+240>:	ld	a0,0(s1)
   0x0000002aaacc7824 <+244>:	ld	a0,48(a0)
   0x0000002aaacc7828 <+248>:	ld	t5,48(a0)
   0x0000002aaacc782c <+252>:	mv	a0,s6
--Type <RET> for more, q to quit, c to continue without paging--c
   0x0000002aaacc7830 <+256>:	mv	a1,s2
   0x0000002aaacc7834 <+260>:	ld	a2,0(t5)
   0x0000002aaacc7838 <+264>:	jalr	a2
=> 0x0000002aaacc783c <+268>:	sext.w	s7,a0

which jumps to:

    //
    // Initial dispatch on an interface when we don't have a cache yet.
    //
    LEAF_ENTRY RhpInitialInterfaceDispatch, _TEXT
    ALTERNATE_ENTRY RhpInitialDynamicInterfaceDispatch
        // Trigger an AV if we're dispatching on a null this.
        // The exception handling infrastructure is aware of the fact that this is the first
        // instruction of RhpInitialInterfaceDispatch and uses it to translate an AV here
        // to a NullReferenceException at the callsite.
        lw zero, 0(a0)

        // Just tail call to the cache miss helper.
        tail       C_FUNC(RhpInterfaceDispatchSlow)
    LEAF_END RhpInitialInterfaceDispatch, _TEXT

    //
    // Cache miss case, call the runtime to resolve the target and update the cache.
    // Use universal transition helper to allow an exception to flow out of resolution.
    //
    LEAF_ENTRY RhpInterfaceDispatchSlow, _TEXT
        // a1 contains the interface dispatch cell address.
        // Calling convention of the universal thunk is:
        //  a2: target address for the thunk to call
        //  a1: parameter of the thunk's target
        PREPARE_EXTERNAL_VAR RhpCidResolve, a2
        mv a3, a1
        tail       C_FUNC(RhpUniversalTransition_DebugStepTailCall)
    LEAF_END RhpInterfaceDispatchSlow, _TEXT

to reach RhpUniversalTransition_DebugStepTailCall.

I tried to match how arm64 caller->callee looks like, they use xip0 (which is x16 at run-time) and xip1 (x17). It's not a straightforward standard calling convention.

@am11 am11 force-pushed the feature/nativeaot/riscv64-port branch from 5d1f3fe to f6c3185 Compare January 26, 2025 12:05
@am11
Copy link
Member Author

am11 commented Jan 28, 2025

@jkotas this is ready to merge.

@am11
Copy link
Member Author

am11 commented Feb 6, 2025

UniversalTransition is fixed, thanks to @jkotas' pointers. Now runtime is initialized and we are reaching user code.

Next, we are failing at lock:

Thread 1 "app1" received signal SIGSEGV, Segmentation fault.
0x0000002aaac99f78 in S_P_CoreLib_System_Collections_Hashtable_SyncHashtable__set_Item (this=..., key=..., value=...) at /runtime/src/libraries/System.Private.CoreLib/src/System/Collections/Hashtable.cs:1277
1277	                    lock (_table.SyncRoot)
(gdb) list
1272	            public override object? this[object key]
1273	            {
1274	                get => _table[key];
1275	                set
1276	                {
1277	                    lock (_table.SyncRoot)
1278	                    {
1279	                        _table[key] = value;
1280	                    }
1281	                }
(gdb) bt
#0  0x0000002aaac99f78 in S_P_CoreLib_System_Collections_Hashtable_SyncHashtable__set_Item (this=..., key=..., value=...) at /runtime/src/libraries/System.Private.CoreLib/src/System/Collections/Hashtable.cs:1277
#1  0x0000002aaac5dbdc in S_P_CoreLib_System_Text_EncodingTable__GetCodePageFromName (name=...) at /runtime/src/libraries/System.Private.CoreLib/src/System/Text/EncodingTable.cs:44
#2  0x0000002aaac5adb8 in S_P_CoreLib_System_Text_Encoding__GetEncoding_1 (name=...) at /runtime/src/libraries/System.Private.CoreLib/src/System/Text/Encoding.cs:282
#3  0x0000002aaac1bbac in System_Console_System_Text_EncodingHelper__GetEncodingFromCharset () at /runtime/src/libraries/Common/src/System/Text/EncodingHelper.Unix.cs:15
#4  0x0000002aaac179cc in System_Console_System_ConsolePal__GetConsoleEncoding () at /runtime/src/libraries/System.Console/src/System/ConsolePal.Unix.cs:697
#5  0x0000002aaac16d44 in System_Console_System_Console__get_OutputEncoding () at /runtime/src/libraries/System.Console/src/System/Console.cs:118
#6  0x0000002aaac16ee8 in System_Console_System_Console__CreateOutputWriter (outputStream=...) at /runtime/src/libraries/System.Console/src/System/Console.cs:238
#7  0x0000002aaac170d0 in System_Console_System_Console___get_Out_g__EnsureInitialized_26_0 () at /runtime/src/libraries/System.Console/src/System/Console.cs:208
#8  0x0000002aaac16e48 in System_Console_System_Console__get_Out () at /runtime/src/libraries/System.Console/src/System/Console.cs:200
#9  0x0000002aaac17024 in System_Console_System_Console__WriteLine_12 (value=...) at /runtime/src/libraries/System.Console/src/System/Console.cs:813
#10 0x0000002aaaca7794 in app1_Program___Main__ (args=...) at /app1/Program.cs:2
#11 0x0000002aaaccd078 in app1__Module___StartupCodeMain () at /runtime/src/coreclr/nativeaot/Common/src/System/Collections/Generic/LowLevelDictionary.cs:287
#12 0x0000002aaab6fdd2 in main (argc=1, argv=0x3ffffff1b8) at /runtime/src/coreclr/nativeaot/Bootstrap/main.cpp:225

Not sure if we need to adjust JIT for it, there is no asm stub involved but I haven't dig deeper.

@am11
Copy link
Member Author

am11 commented Feb 7, 2025

IMO the JIT issue can be looked at separately. This PR is fixing number of things and incrementally making progress, certainly not the final one in this series.

@jkotas
Copy link
Member

jkotas commented Feb 7, 2025

Could you please revert the changes in the write barriers since they are not correct fix? The rest looks good.

@am11
Copy link
Member Author

am11 commented Feb 7, 2025

Applied @filipnavara's suggestion and changed to ret. Progress is still the same as #111735 (comment).

@am11 am11 force-pushed the feature/nativeaot/riscv64-port branch from 24cdfa1 to a9443e4 Compare February 7, 2025 11:32
// t3, t4 : trashed
// t5 : incremented by 8
// t2, t6 : trashed
// t3 : incremented by 8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// t3 : incremented by 8

These increments are only part of JIT_ByRefWriteBarrier contracts.

These increments are not part of RhpCheckedAssignRef and RhpAssignRef contracts. They can be deleted here and in a few places that have this copy&paste bug.

Regular CoreCLR shares the same asm code for all write barriers, and that's how we ended up with them. (Regular CoreCLR write barrier code has TODO to stop sharing the asm code to get rid of this and other minor inefficiencies.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we delete addi t3, t3, 8 lines from RhpCheckedAssignRef and RhpAssignRef as well? (it's under NotInHeap below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing that line from both RhpCheckedAssignRef and RhpAssignRef crashes before reaching the user code:

Thread 1 "app1" received signal SIGSEGV, Segmentation fault.
0x0000002aaac7824c in S_P_CoreLib_System_Runtime_TypeCast__StelemRef_Helper (element=..., elementType=0x2aaad29790, obj=...) at /runtime/src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/TypeCast.cs:843
843	            CastResult result = s_castCache.TryGet((nuint)obj.GetMethodTable() + (int)AssignmentVariation.BoxedSource, (nuint)elementType);

Keeping it reaches Console.WriteLine in Program.cs line 2 and we fail later in lock primitives #111735 (comment) (I'm working on it in separate branch).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I have missed that the write barrier implementation is reused in similar way for some arches on native AOT too. We can look into this separately.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@jkotas jkotas merged commit 42a58a2 into dotnet:main Feb 7, 2025
112 checks passed
@am11 am11 deleted the feature/nativeaot/riscv64-port branch February 7, 2025 15:54
@github-actions github-actions bot locked and limited conversation to collaborators Mar 10, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-riscv Related to the RISC-V architecture area-NativeAOT-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants