-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash in ObjWriter on ARM during publish #1388
Comments
When I try run ILC on Windows, On the ObjWriter stage probably I receive That's seems to be from one of these places I understand that this maybe invalid case and @MichalStrehovsky advice to run on Windows does not applied. |
Which reloc is that? (It should be visible which data blob the reloc is coming from on the managed side, in ObjectWriter.cs). Some work was done to support the various ARM32 ELF relocs, but maybe we regressed somewhere (e.g. dotnet/corert#4899). |
I actually not sure where this happens. since this is not a crash, just "unsupported" scenario which exit ILC. I will try to find what reloc is that. Also thanks for the hint, I will take a look and try to understand what can go wrong. |
Okay. This is some sort of crash. At least application exits at call to |
Error happens when Stack trace.
So far, I will try to get reloc from that stack trace. But not sure what I should look for and where to continue search for now. Will continue, but gladly ask for any help or advice. |
From what I understand. libObjwriter seems to be producing optimized code, so debugging of iterators a bit problematic. Is it possible to build it in Debug mode. I was using |
ILCompiler.csproj is hardcoded to pick up the Release version of ObjWriter. Look for ObjWriterArtifactPath - this needs to be set to Debug to pick up your debug build of it. I think it's also hardcoded to release in objwriter.proj - you need to set that to Debug as well to build the Debug version in the first place. All the relocs you see should go through ObjectWriter::EmitSymbolRef (src\coreclr\tools\aot\ObjWriter\objwriter.cpp) so maybe there's something wrong in how the reloc is translated to LLVM terms there. I don't see anything obvious that would be producing 8 byte non-PC-relative relocs on ARM there, unless we're somehow ending up with a IMAGE_REL_BASED_DIR64 reloc (might want to set breakpoint on that just to be sure). |
Debug version was fruitful.
From my understanding this is caused that C# pass RelocType = IMAGE_REL_BASED_THUMB_MOV32_PCREL which is not present in C++ side and thus not handled. |
So far I look at the NativeAOT branch to check where Loc 1runtimelab/src/coreclr/debug/daccess/nidump.cpp Lines 5343 to 5350 in 4ffe2eb
If this is Windows only that make sense, or at least maybe this is Win ARM only. Loc 2runtimelab/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/ObjectWriter.cs Lines 1069 to 1076 in 4ffe2eb
Not sure that this location may benefit from adding one more case. Loc 3
I think I would like add one case here, but maybe I should copy something like this runtimelab/src/coreclr/tools/aot/ObjWriter/objwriter.cpp Lines 435 to 439 in 4ffe2eb
Loc 4runtimelab/src/coreclr/tools/Common/Compiler/DependencyAnalysis/ObjectDataBuilder.cs Line 302 in 4ffe2eb
First impression is to add one more case here, but with my limited knowledge it can be and case above: runtimelab/src/coreclr/tools/Common/Compiler/DependencyAnalysis/ObjectDataBuilder.cs Lines 294 to 295 in 4ffe2eb
Loc 5
This is most likely proper handling, but include this for completeness. Loc 6runtimelab/src/coreclr/vm/peimagelayout.cpp Line 270 in 4ffe2eb
If I understand properly this is for Win AMR32 and I should ignore this. This seems to be location which trigger creation of reloc which crash ObjWriter. runtimelab/src/coreclr/jit/emit.cpp Lines 8772 to 8779 in 4ffe2eb
|
I think the relocation is generated by RyuJIT in "Loc 6" you identified above because we set the JIT_FLAG_RELATIVE_CODE_RELOCS here: runtimelab/src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs Lines 3755 to 3756 in 4ffe2eb
This was added when Samsung was fixing crossgen2 for Linux ARM32 here: dotnet/runtime#33153 I'm not sure we need it for NativeAOT. Samsung was running CoreRT on ARM32 in the past. So it might be worth trying to put those two lines under The other option (if ifdeffing causes issues at runtime) is to pipe the new relocation through to the object writer. It will go somewhere along the lines of dotnet/corert#3433. |
|
Now I have another problem and code fails with assertion here.
This from here |
I wouldn't consider it an LLVM bug just yet. What's the stack for the assert? Is it generating unwind info or debug info (or something else?)? |
Here the stack trace. Reason why I thinking that is LLVM is But when looking at Stack trace
|
The RegNum feels pretty high, but I don't actually know. The blob objectwriter is interpreting here is generated by RyuJIT - somewhere in unwindarm.cpp. |
Unwind and ARM now I feel scary. Will take a look. |
If you add: I would step through |
Once you're ready to debug RyuJIT, don't forget to pass |
So insisting on closing the issue. Now I know that somebody has a lot of permissions. |
Here my observations
But in ObjWriter there +64 shift instead of +256 runtimelab/src/coreclr/tools/aot/ObjWriter/debugInfo/dwarf/dwarfGen.cpp Lines 173 to 196 in 2794db1
ARM docs say that 256 corresponding to D0-D31 and S0-S31 is obsolete now. I remember that I saw somewhere one more mapping registered to DWarf numbers and I saw 81 there and it was corresponding to registeres, but cannot find right now. Speculations |
Okay. I find that magic place. it was runtimelab/src/coreclr/jit/unwindarm.cpp Lines 124 to 126 in 2794db1
And I still see mixed signals. I notice that there F0-F31 registeres for ARM32, at the same time in the docs which I mention previously it only form F0-F7 so where F8-F31 comes from. Based on DWARF reg mapping this is S0-S31 registers. But LLVM want new registers D0-D31 from VFP-fp3, and in order to make it works, I have to check what's emitted and switch from emit for Sx registers to Dx registers . Or look on possibility to tweak LLVM to accept these registers in DWARF because seems to be codegen is working and only DWARF gen is breaks. |
And regarding DWARF numbers. I see that LLVM produce only registers 0-15 and 256+ for ARM See that for ARM64 they produce and registers in range 64-XXX I suspect small patch to LLVM can fix that, if I annotate |
RyuJIT should be only generating push/pop of the D0-D31 registers. Look for "Our calling convention requires that we only use vpush for TYP_DOUBLE registers" in |
Not sure if I do in proper direction. What I done:
diff --git a/src/coreclr/jit/unwind.cpp b/src/coreclr/jit/unwind.cpp
index 14dc49f50fa..2064819d29c 100644
--- a/src/coreclr/jit/unwind.cpp
+++ b/src/coreclr/jit/unwind.cpp
@@ -187,7 +187,12 @@ void Compiler::unwindPushPopMaskCFI(regMaskTP regMask, bool isFloat)
regMaskTP regBit = isFloat ? genRegMask(REG_FP_FIRST) : 1;
for (regNumber regNum = isFloat ? REG_FP_FIRST : REG_FIRST; regNum < REG_COUNT;
- regNum = REG_NEXT(regNum), regBit <<= 1)
+#if TARGET_ARM
+ regNum = isFloat ? ((regNumber)((unsigned)(regNum) + 2)) : REG_NEXT(regNum),
+#else
+ regNum = REG_NEXT(regNum),
+#endif
+ regBit <<= 1)
{
if (regBit > regMask)
{
diff --git a/src/coreclr/jit/unwindarm.cpp b/src/coreclr/jit/unwindarm.cpp
index e26d6e008f0..da4219b1d53 100644
--- a/src/coreclr/jit/unwindarm.cpp
+++ b/src/coreclr/jit/unwindarm.cpp
@@ -71,100 +71,52 @@ short Compiler::mapRegNumToDwarfReg(regNumber reg)
dwarfReg = 15;
break;
case REG_F0:
- dwarfReg = 64;
- break;
- case REG_F1:
- dwarfReg = 65;
+ dwarfReg = 256;
break;
case REG_F2:
- dwarfReg = 66;
- break;
- case REG_F3:
- dwarfReg = 67;
+ dwarfReg = 257;
break;
case REG_F4:
- dwarfReg = 68;
- break;
- case REG_F5:
- dwarfReg = 69;
+ dwarfReg = 258;
break;
case REG_F6:
- dwarfReg = 70;
- break;
- case REG_F7:
- dwarfReg = 71;
+ dwarfReg = 259;
break;
case REG_F8:
- dwarfReg = 72;
- break;
- case REG_F9:
- dwarfReg = 73;
+ dwarfReg = 260;
break;
case REG_F10:
- dwarfReg = 74;
- break;
- case REG_F11:
- dwarfReg = 75;
+ dwarfReg = 261;
break;
case REG_F12:
- dwarfReg = 76;
- break;
- case REG_F13:
- dwarfReg = 77;
+ dwarfReg = 262;
break;
case REG_F14:
- dwarfReg = 78;
- break;
- case REG_F15:
- dwarfReg = 79;
+ dwarfReg = 263;
break;
case REG_F16:
- dwarfReg = 80;
- break;
- case REG_F17:
- dwarfReg = 81;
+ dwarfReg = 264;
break;
case REG_F18:
- dwarfReg = 82;
- break;
- case REG_F19:
- dwarfReg = 83;
+ dwarfReg = 265;
break;
case REG_F20:
- dwarfReg = 84;
- break;
- case REG_F21:
- dwarfReg = 85;
+ dwarfReg = 266;
break;
case REG_F22:
- dwarfReg = 86;
- break;
- case REG_F23:
- dwarfReg = 87;
+ dwarfReg = 267;
break;
case REG_F24:
- dwarfReg = 88;
- break;
- case REG_F25:
- dwarfReg = 89;
+ dwarfReg = 268;
break;
case REG_F26:
- dwarfReg = 90;
- break;
- case REG_F27:
- dwarfReg = 91;
+ dwarfReg = 269;
break;
case REG_F28:
- dwarfReg = 92;
- break;
- case REG_F29:
- dwarfReg = 93;
+ dwarfReg = 270;
break;
case REG_F30:
- dwarfReg = 94;
- break;
- case REG_F31:
- dwarfReg = 95;
+ dwarfReg = 271;
break;
default:
noway_assert(!"unexpected REG_NUM"); This produce error in runtimelab/src/coreclr/tools/aot/ObjWriter/debugInfo/dwarf/dwarfGen.cpp Lines 433 to 435 in 2794db1
I look around and notice that there mapping between registers and DWARF numbers in the same file, so I do quick hack to check if that helps. diff --git a/src/coreclr/tools/aot/ObjWriter/debugInfo/dwarf/dwarfGen.cpp b/src/coreclr/tools/aot/ObjWriter/debugInfo/dwarf/dwarfGen.cpp
index bc2bac9728e..396f03c70d9 100644
--- a/src/coreclr/tools/aot/ObjWriter/debugInfo/dwarf/dwarfGen.cpp
+++ b/src/coreclr/tools/aot/ObjWriter/debugInfo/dwarf/dwarfGen.cpp
@@ -193,7 +193,7 @@ static int GetDwarfRegNum(Triple::ArchType ArchType, int RegNum) {
case RegNumArm::REGNUM_PC: return 15;
// fp registers
default:
- return RegNum - static_cast<int>(RegNumArm::REGNUM_COUNT) + 64;
+ return (RegNum - static_cast<int>(RegNumArm::REGNUM_COUNT)) / 2 + 256;
}
case Triple::aarch64: // fall through
case Triple::aarch64_be: That indeed helps, anyway something in that spirit should be changed. Now it fail when handling
with Expression: (isUIntN(8 * Size, Value) || isIntN(8 * Size, Value)) && "Invalid size" .Based on how it is called I think issue with the fact that I change reg numbers to be more then 256 which does not fit into single byte. The more I look at line |
I think for registers with number greater then 31 I should use |
This change has 2 parts, one which is JIT part and likely require submission to runtime. I keep that change here to give full scope of changes. Will extract if everything would be fine. JIT for ARM now start numering floating point registers starting from 256 Second part is DWARF generation In order to produce DWARF I made following changes - Account for register numbers more then 31, by use DW_OP_bregx and DW_OP_regx for large regnum - Combine VLT_FPSTK and VLT_STK handling. See dotnet#1388
After introduction of VFP-v3 ARM S0-S31 no longer can be generated using LLVM because numbering of registers to start from 256 and only D0-D31 are used. So this change encode S0 as D0, S2 as D1, etc. Also use reg nums for DXX registers. This change fix generation of CFI codes, which trigger issue with generation of DWARF using LLVM in NativeAOT See https://developer.arm.com/documentation/ihi0040/c/?lang=en#dwarf-register-names See dotnet/runtimelab#1388
After introduction of VFP-v3 ARM S0-S31 no longer can be generated using LLVM because numbering of registers to start from 256 and only D0-D31 are used. So this change encode S0 as D0, S2 as D1, etc. Also use reg nums for DXX registers. This change fix generation of CFI codes, which trigger issue with generation of DWARF using LLVM in NativeAOT See https://developer.arm.com/documentation/ihi0040/c/?lang=en#dwarf-register-names See dotnet/runtimelab#1388
Fixed by #1413 |
After #1387 during publish ILC crashed with following stack trace.
I try to look at locals, but without luck
Even if I build LLVM and ObjWriter on this Raspberry using
./build.sh nativeaot.objwriter -rc Debug -lc Release
. It will take a bit of time to figure out what's going on.The text was updated successfully, but these errors were encountered: