s390x fixes that enable 100% test suite pass - Part 1 #650
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Further s390x Fixes
This set up patches enables s390x to pass the test suite. The test suite success is dependent on part 2 which is to common code in the dotnet/runtime repo.
1. Mono array layout
The array "raw data" layout is defined in two places, one for the Mono C++ code, and one for C# code (which should match!):
vs.
However, this only actually matches perfectly on 32-bit platforms if
MONO_BIG_ARRAYS
is false and 64-bit platforms ifMONO_BIG_ARRAYS
is true. In the dotnet build,MONO_BIG_ARRAYS
is false, so we have a problem on 64-bit platforms. On little-endian 64-bit platforms the mismatch is mostly harmless, but on big-endian 64-bit platforms this causes test case failures inSystem.Tests.ArrayTests.Clear_Invalid
.The patch fixes this for s390x, but it should be possible to implement this in a cleaner way ...
2. Tail call implementation
There were actually two bugs here. First of all, the
tailcall_reg
instruction was not marked incpu-s390x.md
to have an sreg1 input, which meant that the target address was never actually loaded up.More problematically, the way the tailcall implementation handles call-saved argument registers was fundamentally broken. This is a problem on s390x with the r6 register, which is call-saved even though it is also used as argument register. This is a problem for tail calls, because you have to restore the old value before performing the tail call, which conflicts with loading the required argument value. The same problem also applies for the RGCTX/IMT register, which is likewise both call-saved and used to hold an implicit argument.
The current Mono code simply does not restore the old value and only loads the argument value. But that is an ABI break and may cause failures in a caller higher up on the stack once the tail-called routine finally returns. Consider three functions A, B, C where
A:
B:
C:
Once C finally returns to A, the value in R6 will be the value it had on entry to C, not that on entry to B, which is what the code in A relies on.
The following patch fixes this by disabling tail calls in those cases where R6 is used as argument register, as well as in all cases where the RGCTX/IMT register is used.
Note that it might be possible to re-enable the latter cases by using a call-clobbered register as RGCTX/IMT register. One option might be
%r1
, which is also used by GCC as the static chain pointer for nested functions (which is a somewhat similar purpose). This would also match what x86_64 is doing: they likewise use the static chain register for RGCTX/IMT.I haven't implemented this since there might be a problem with other intervening trampolines clobbering %r1 -- this would need careful review and possibly some reshuffling. I guess this can be done later as an optimization.
3. Crashes due to corrupted sigaltstack
When sigaltstack is enabled, the kernel provides the address of the alternate stack to signal handlers in the
uc_stack.ss_sp
field. More importantly, the kernel also reads out that field once the signal handler returns and updates its notion of what alternate stack is to be used for future signals! This is a problem with current Mono code, which writes the user-code stack pointer intoss_sp
-- causing the kernel to attempt to use that part of the user stack as alternate stack for the next signal. If that then overlaps then regular stack (which is likely), the kernel will consider this value corrupted and deliver a SIGSEGV instead.Looking into this, I'm not really sure why Mono (the s390x code only) even writes to
ss_sp
in the first place. This is apparently read out in just one place, where we actually want to know the user stack, so I guess we can just use r15 directly instead.4. Codegen problem with floating-point to integer conversions
The Mono back-end uses
cegbr
/cdgbr
(64-bit int to float conversion instructions) even in the case where the source is actually a 32-bit integer. It really should be usingcefbr
/cdfbr
in those cases, which is what the following patch implements.Note that I'm a bit unclear about the intention of the original code here: there appears to be some effort made to hold 32-bit integers in 64-bit sign-extended form in registers, in which case the
cegbr
/cdgbr
would probably be also correct (but still not really preferable). However, this doesn't seem to be done consistently.5. Codegen problems with integer overflow detection
There were two separate bugs with properly detecting integer overflows. First of all, the
OP_IADD_OVF_UN
implementation used the 64-bitalgr
instead of the 32-bitalr
instruction. While the (32-bit) numerical result is of course the same, the detected overflow is incorrect.More problematic is the overflow detection for signed multiplication. The code seems to only verify whether the sign of the result matches the product of the signs of the inputs -- but this doesn't reliably detect overflow! While of course there must have been an overflow if the sign doesn't match, there can still be an overflow if the sign does match, for example in the case from the test suite: 1.000.000.000 * 10
Now, on recent machines we have hardware support to detect overflow:
msgrkc
andmsrkc
. While Mono was already usingmsgrkc
(so the problem doesn't occur for 64-bit multiplication) it didn't usemsrkc
. The following patch just adds that case. Note that this still only fixes the problem on z14 on higher; the code for older machines really also ought to be fixed at some point.6. Codegen problems with interlocked add
Finally, there is a subtle problem with the code currently generated for the interlocked add primitives, if the machine supports the laa(g) instruction. Those handle the whole interlocked-add operation in hardware, so the only thing Mono codegen needs to handle in addition is the proper return value. The laa(g) instructions return the value the memory location had before the addition, while the Mono interlocked-add primitive is specified to return the value the memory location has after the addition.
To fix this discrepancy, the code generated by Mono will perform another load from the memory location immediately after the laa(g). This usually returns the correct value, but it creates a race condition: in between the laa(g) and the load, another thread might have changed the value! This violates the atomicity guarantee of the interlocked-add primitive.
To fix this, the following patch instead uses the (atomically loaded) result of the laa(g) instruction and then simply performs the original addition a second time in registers.