s390x fixes that enable 100% test suite pass - Part 1 #650

nealef · 2021-02-07T01:44:25Z

Further s390x Fixes

This set up patches enables s390x to pass the test suite. The test suite success is dependent on part 2 which is to common code in the dotnet/runtime repo.

1. Mono array layout

The array "raw data" layout is defined in two places, one for the Mono C++ code, and one for C# code (which should match!):

struct _MonoArray {
    MonoObject obj;
    /* bounds is NULL for szarrays */
    MonoArrayBounds *bounds;
    /* total number of elements of the array */
    mono_array_size_t max_length;
    /* we use mono_64bitaligned_t to ensure proper alignment on platforms that need it */
    mono_64bitaligned_t vector [MONO_ZERO_LEN_ARRAY];
};

vs.

private class RawData
{
    public IntPtr Bounds;
    public IntPtr Count;
    public byte Data;
}

However, this only actually matches perfectly on 32-bit platforms if MONO_BIG_ARRAYS is false and 64-bit platforms if MONO_BIG_ARRAYS is true. In the dotnet build, MONO_BIG_ARRAYS is false, so we have a problem on 64-bit platforms. On little-endian 64-bit platforms the mismatch is mostly harmless, but on big-endian 64-bit platforms this causes test case failures in System.Tests.ArrayTests.Clear_Invalid.

The patch fixes this for s390x, but it should be possible to implement this in a cleaner way ...

2. Tail call implementation

There were actually two bugs here. First of all, the tailcall_reg instruction was not marked in cpu-s390x.md to have an sreg1 input, which meant that the target address was never actually loaded up.

More problematically, the way the tailcall implementation handles call-saved argument registers was fundamentally broken. This is a problem on s390x with the r6 register, which is call-saved even though it is also used as argument register. This is a problem for tail calls, because you have to restore the old value before performing the tail call, which conflicts with loading the required argument value. The same problem also applies for the RGCTX/IMT register, which is likewise both call-saved and used to hold an implicit argument.

The current Mono code simply does not restore the old value and only loads the argument value. But that is an ABI break and may cause failures in a caller higher up on the stack once the tail-called routine finally returns. Consider three functions A, B, C where

A:

[...]
define R6
call B (does not use R6 as argument)
use R6
[...]

B:

save R6
[...]
load R6 with argument value
tail call C (uses R6 as argument)

C:

save R6
[...]
use R6 argument value
[...]
restore R6
return

Once C finally returns to A, the value in R6 will be the value it had on entry to C, not that on entry to B, which is what the code in A relies on.

The following patch fixes this by disabling tail calls in those cases where R6 is used as argument register, as well as in all cases where the RGCTX/IMT register is used.

Note that it might be possible to re-enable the latter cases by using a call-clobbered register as RGCTX/IMT register. One option might be %r1, which is also used by GCC as the static chain pointer for nested functions (which is a somewhat similar purpose). This would also match what x86_64 is doing: they likewise use the static chain register for RGCTX/IMT.

I haven't implemented this since there might be a problem with other intervening trampolines clobbering %r1 -- this would need careful review and possibly some reshuffling. I guess this can be done later as an optimization.

3. Crashes due to corrupted sigaltstack

When sigaltstack is enabled, the kernel provides the address of the alternate stack to signal handlers in the uc_stack.ss_sp field. More importantly, the kernel also reads out that field once the signal handler returns and updates its notion of what alternate stack is to be used for future signals! This is a problem with current Mono code, which writes the user-code stack pointer into ss_sp -- causing the kernel to attempt to use that part of the user stack as alternate stack for the next signal. If that then overlaps then regular stack (which is likely), the kernel will consider this value corrupted and deliver a SIGSEGV instead.

Looking into this, I'm not really sure why Mono (the s390x code only) even writes to ss_sp in the first place. This is apparently read out in just one place, where we actually want to know the user stack, so I guess we can just use r15 directly instead.

4. Codegen problem with floating-point to integer conversions

The Mono back-end uses cegbr/cdgbr (64-bit int to float conversion instructions) even in the case where the source is actually a 32-bit integer. It really should be using cefbr/cdfbr in those cases, which is what the following patch implements.

Note that I'm a bit unclear about the intention of the original code here: there appears to be some effort made to hold 32-bit integers in 64-bit sign-extended form in registers, in which case the cegbr/cdgbr would probably be also correct (but still not really preferable). However, this doesn't seem to be done consistently.

5. Codegen problems with integer overflow detection

There were two separate bugs with properly detecting integer overflows. First of all, the OP_IADD_OVF_UN implementation used the 64-bit algr instead of the 32-bit alr instruction. While the (32-bit) numerical result is of course the same, the detected overflow is incorrect.

More problematic is the overflow detection for signed multiplication. The code seems to only verify whether the sign of the result matches the product of the signs of the inputs -- but this doesn't reliably detect overflow! While of course there must have been an overflow if the sign doesn't match, there can still be an overflow if the sign does match, for example in the case from the test suite: 1.000.000.000 * 10

Now, on recent machines we have hardware support to detect overflow: msgrkc and msrkc. While Mono was already using msgrkc (so the problem doesn't occur for 64-bit multiplication) it didn't use msrkc. The following patch just adds that case. Note that this still only fixes the problem on z14 on higher; the code for older machines really also ought to be fixed at some point.

6. Codegen problems with interlocked add

Finally, there is a subtle problem with the code currently generated for the interlocked add primitives, if the machine supports the laa(g) instruction. Those handle the whole interlocked-add operation in hardware, so the only thing Mono codegen needs to handle in addition is the proper return value. The laa(g) instructions return the value the memory location had before the addition, while the Mono interlocked-add primitive is specified to return the value the memory location has after the addition.

To fix this discrepancy, the code generated by Mono will perform another load from the memory location immediately after the laa(g). This usually returns the correct value, but it creates a race condition: in between the laa(g) and the load, another thread might have changed the value! This violates the atomicity guarantee of the interlocked-add primitive.

To fix this, the following patch instead uses the (atomically loaded) result of the laa(g) instruction and then simply performs the original addition a second time in registers.

…, one for the Mono C++ code, and one for C# code but don't match for big endian systems - Fix two bugs relating to the tailcall implementation for s390x - Fix crash due to s390x sigaltstack implementation - Fix codegen problem with floating-point to integer conversions - Fix codegen problems with integer overflow detection - Fix codegen problems with interlocked add

akoeplinger added the area-s390x label Feb 7, 2021

nealef mentioned this pull request Feb 7, 2021

Big endian fixes for dotnet runtime dotnet/runtime#47981

Merged

akoeplinger merged commit 6246c5e into dotnet:feature/s390x Feb 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

s390x fixes that enable 100% test suite pass - Part 1 #650

s390x fixes that enable 100% test suite pass - Part 1 #650

nealef commented Feb 7, 2021

s390x fixes that enable 100% test suite pass - Part 1 #650

s390x fixes that enable 100% test suite pass - Part 1 #650

Conversation

nealef commented Feb 7, 2021

Further s390x Fixes

1. Mono array layout

2. Tail call implementation

A:

B:

C:

3. Crashes due to corrupted sigaltstack

4. Codegen problem with floating-point to integer conversions

5. Codegen problems with integer overflow detection

6. Codegen problems with interlocked add