Skip to content

Commit

Permalink
- [Big Endian] - The array "raw data" layout is defined in two places…
Browse files Browse the repository at this point in the history
…, one for the Mono C++ code, and one for C# code but don't match for big endian systems (#650)

## Further s390x Fixes

This set up patches enables s390x to pass the test suite. The test suite success is dependent on part 2 which is to common code in the dotnet/runtime repo.

### 1. Mono array layout

The array "raw data" layout is defined in two places, one for the Mono C++ code, and one for C# code (which should match!):
```
struct _MonoArray {
    MonoObject obj;
    /* bounds is NULL for szarrays */
    MonoArrayBounds *bounds;
    /* total number of elements of the array */
    mono_array_size_t max_length;
    /* we use mono_64bitaligned_t to ensure proper alignment on platforms that need it */
    mono_64bitaligned_t vector [MONO_ZERO_LEN_ARRAY];
};
```
vs.
```
private class RawData
{
    public IntPtr Bounds;
    public IntPtr Count;
    public byte Data;
}
```
However, this only actually matches perfectly on 32-bit platforms if `MONO_BIG_ARRAYS` is false and 64-bit platforms if `MONO_BIG_ARRAYS` is true. In the dotnet build, `MONO_BIG_ARRAYS` is false, so we have a problem on 64-bit platforms. On little-endian 64-bit platforms the mismatch is mostly harmless, but on big-endian 64-bit platforms this causes test case failures in `System.Tests.ArrayTests.Clear_Invalid`.

The patch fixes this for s390x, but it should be possible to implement this in a cleaner way ...

### 2. Tail call implementation

There were actually two bugs here. First of all, the `tailcall_reg` instruction was not marked in `cpu-s390x.md` to have an sreg1 input, which meant that the target address was never actually loaded up.

More problematically, the way the tailcall implementation handles call-saved argument registers was fundamentally broken. This is a problem on s390x with the r6 register, which is call-saved even though it is also used as argument register. This is a problem for tail calls, because you have to restore the old value before performing the tail call, which conflicts with loading the required argument value. The same problem also applies for the RGCTX/IMT register, which is likewise both call-saved and used to hold an _implicit_ argument.

The current Mono code simply does not restore the old value and only loads the argument value. But that is an ABI break and may cause failures in a caller higher up on the stack once the tail-called routine finally returns. Consider three functions A, B, C where
#### A:
```
[...]
define R6
call B (does not use R6 as argument)
use R6
[...]
```
##### B:
```
save R6
[...]
load R6 with argument value
tail call C (uses R6 as argument)
```
#### C:
```
save R6
[...]
use R6 argument value
[...]
restore R6
return
```
Once C finally returns to A, the value in R6 will be the value it had on entry to C, not that on entry to B, which is what the code in A relies on.

The following patch fixes this by disabling tail calls in those cases where R6 is used as argument register, as well as in all cases where the RGCTX/IMT register is used.

Note that it might be possible to re-enable the latter cases by using a call-clobbered register as RGCTX/IMT register. One option might be `%r1`, which is also used by GCC as the static chain pointer for nested functions (which is a somewhat similar purpose). This would also match what x86_64 is doing: they likewise use the static chain register for RGCTX/IMT.

I haven't implemented this since there might be a problem with other intervening trampolines clobbering %r1 -- this would need careful review and possibly some reshuffling. I guess this can be done later as an optimization.

### 3. Crashes due to corrupted sigaltstack

When sigaltstack is enabled, the kernel provides the address of the alternate stack to signal handlers in the `uc_stack.ss_sp` field. More importantly, the kernel also *reads* out that field once the signal handler returns and updates its notion of what alternate stack is to be used for future signals! This is a problem with current Mono code, which writes the user-code stack pointer into `ss_sp` -- causing the kernel to attempt to use that part of the user stack as alternate stack for the next signal. If that then overlaps then regular stack (which is likely), the kernel will consider this value corrupted and deliver a SIGSEGV instead.

Looking into this, I'm not really sure why Mono (the s390x code only) even writes to `ss_sp` in the first place. This is apparently read out in just one place, where we actually want to know the user stack, so I guess we can just use r15 directly instead.

#### 4. Codegen problem with floating-point to integer conversions

The Mono back-end uses `cegbr`/`cdgbr` (64-bit int to float conversion instructions) even in the case where the source is actually a 32-bit integer. It really should be using `cefbr`/`cdfbr` in those cases, which is what the following patch implements.

Note that I'm a bit unclear about the intention of the original code here: there appears to be some effort made to hold 32-bit integers in 64-bit sign-extended form in registers, in which case the `cegbr`/`cdgbr` would probably be also correct (but still not really preferable). However, this doesn't seem to be done consistently. 

#### 5. Codegen problems with integer overflow detection

There were two separate bugs with properly detecting integer overflows. First of all, the `OP_IADD_OVF_UN` implementation used the 64-bit `algr` instead of the 32-bit `alr` instruction. While the (32-bit) numerical result is of course the same, the detected overflow is incorrect.

More problematic is the overflow detection for signed multiplication. The code seems to only verify whether the sign of the result matches the product of the signs of the inputs -- but this doesn't reliably detect overflow! While of course there must have been an overflow if the sign doesn't match, there can still be an overflow if the sign *does* match, for example in the case from the test suite: 1.000.000.000 * 10

Now, on recent machines we have hardware support to detect overflow: `msgrkc` and `msrkc`. While Mono was already using `msgrkc` (so the problem doesn't occur for 64-bit multiplication) it didn't use `msrkc`. The following patch just adds that case. Note that this still only fixes the problem on z14 on higher; the code for older machines really also ought to be fixed at some point.

#### 6. Codegen problems with interlocked add

Finally, there is a subtle problem with the code currently generated for the interlocked add primitives, if the machine supports the laa(g) instruction. Those handle the whole interlocked-add operation in hardware, so the only thing Mono codegen needs to handle in addition is the proper return value. The laa(g) instructions return the value the memory location had *before* the addition, while the Mono interlocked-add primitive is specified to return the value the memory location has *after* the addition.

To fix this discrepancy, the code generated by Mono will perform another load from the memory location immediately after the laa(g). This usually returns the correct value, but it creates a race condition: in between the laa(g) and the load, another thread might have changed the value! This violates the atomicity guarantee of the interlocked-add primitive.

To fix this, the following patch instead uses the (atomically loaded) result of the laa(g) instruction and then simply performs the original addition a second time in registers.

Co-authored-by: Neale Ferguson <neale@ubuntu20-x64-01.devlab.sinenomine.net>
  • Loading branch information
nealef and Neale Ferguson authored Feb 11, 2021
1 parent 610f2fc commit 6246c5e
Show file tree
Hide file tree
Showing 9 changed files with 70 additions and 43 deletions.
3 changes: 2 additions & 1 deletion src/mono/mono/arch/s390x/s390x-codegen.h
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,6 @@ typedef enum {
#define S390_FP s390_r11
#define S390_MINIMAL_STACK_SIZE 160
#define S390_REG_SAVE_OFFSET 48
#define S390_NONPARM_SAVE_OFFSET 56
#define S390_PARM_SAVE_OFFSET 16
#define S390_RET_ADDR_OFFSET 112
#define S390_FLOAT_SAVE_OFFSET 128
Expand Down Expand Up @@ -1213,6 +1212,7 @@ typedef struct {
#define s390_cdsg(c, r1, r2, b, d) S390_RSY_1(c, 0xeb3e, r1, r2, b, d)
#define s390_cdsy(c, r1, r2, b, d) S390_RSY_1(c, 0xeb31, r1, r2, b, d)
#define s390_cebr(c, r1, r2) S390_RRE(c, 0xb309, r1, r2)
#define s390_cefbr(c, r1, r2) S390_RRE(c, 0xb394, r1, r2)
#define s390_cegbr(c, r1, r2) S390_RRE(c, 0xb3a4, r1, r2)
#define s390_cfdbr(c, r1, m, r2) S390_RRF_2(c, 0xb399, r1, m, r2)
#define s390_cfebr(c, r1, m, r2) S390_RRF_2(c, 0xb398, r1, m, r2)
Expand Down Expand Up @@ -1417,6 +1417,7 @@ typedef struct {
#define s390_msgr(c, r1, r2) S390_RRE(c, 0xb90c, r1, r2)
#define s390_msgrkc(c, r1, r2, r3) S390_RRF_1(c, 0xb9ed, r1, r2, r3)
#define s390_msr(c, r1, r2) S390_RRE(c, 0xb252, r1, r2)
#define s390_msrkc(c, r1, r2, r3) S390_RRF_1(c, 0xb9fd, r1, r2, r3)
#define s390_mvc(c, l, b1, d1, b2, d2) S390_SS_1(c, 0xd2, l, b1, d1, b2, d2)
#define s390_mvcl(c, r1, r2) S390_RR(c, 0x0e, r1, r2)
#define s390_mvcle(c, r1, r3, d2, b2) S390_RS_1(c, 0xa8, r1, r3, d2, b2)
Expand Down
2 changes: 1 addition & 1 deletion src/mono/mono/mini/cpu-s390x.md
Original file line number Diff line number Diff line change
Expand Up @@ -237,7 +237,7 @@ sub_ovf_carry: dest:i src1:1 src2:i len:28
sub_ovf_un_carry: dest:i src1:1 src2:i len:12
subcc: dest:i src1:i src2:i len:12
tailcall: len:32 clob:c
tailcall_reg: len:32 clob:c
tailcall_reg: src1:b len:32 clob:c
tailcall_membase: src1:b len:32 clob:c

# Tailcall parameters are moved with one instruction per 256 bytes,
Expand Down
2 changes: 1 addition & 1 deletion src/mono/mono/mini/exceptions-s390x.c
Original file line number Diff line number Diff line change
Expand Up @@ -639,7 +639,7 @@ mono_arch_handle_altstack_exception (void *sigctx, MONO_SIG_HANDLER_INFO_TYPE *s
* requires allocation on the stack, as this wouldn't be encoded in unwind
* information for the caller frame.
*/
sp = (uintptr_t) (UCONTEXT_SP(uc));
sp = (uintptr_t) (UCONTEXT_REG_Rn(uc, 15));
sp = sp - S390_MINIMAL_STACK_SIZE;

mono_sigctx_to_monoctx (uc, uc_copy);
Expand Down
91 changes: 57 additions & 34 deletions src/mono/mono/mini/mini-s390x.c
Original file line number Diff line number Diff line change
Expand Up @@ -2692,7 +2692,7 @@ mono_arch_output_basic_block (MonoCompile *cfg, MonoBasicBlock *bb)
case OP_IADD_OVF_UN:
case OP_S390_IADD_OVF_UN: {
CHECK_SRCDST_COM;
s390_algr (code, ins->dreg, src2);
s390_alr (code, ins->dreg, src2);
EMIT_COND_SYSTEM_EXCEPTION (S390_CC_CY, "OverflowException");
s390_llgfr (code, ins->dreg, ins->dreg);
}
Expand Down Expand Up @@ -3356,21 +3356,27 @@ mono_arch_output_basic_block (MonoCompile *cfg, MonoBasicBlock *bb)
break;
case OP_IMUL_OVF: {
short int *o[2];
s390_ltr (code, s390_r1, ins->sreg1);
s390_jz (code, 0); CODEPTR(code, o[0]);
s390_ltr (code, s390_r0, ins->sreg2);
s390_jnz (code, 6);
s390_lhi (code, s390_r1, 0);
s390_j (code, 0); CODEPTR(code, o[1]);
s390_xr (code, s390_r0, s390_r1);
s390_msr (code, s390_r1, ins->sreg2);
s390_xr (code, s390_r0, s390_r1);
s390_srl (code, s390_r0, 0, 31);
s390_ltr (code, s390_r0, s390_r0);
EMIT_COND_SYSTEM_EXCEPTION (S390_CC_NZ, "OverflowException");
PTRSLOT (code, o[0]);
PTRSLOT (code, o[1]);
s390_lgfr (code, ins->dreg, s390_r1);
if (mono_hwcap_s390x_has_mie2) {
s390_msrkc (code, ins->dreg, ins->sreg1, ins->sreg2);
EMIT_COND_SYSTEM_EXCEPTION (S390_CC_OV, "OverflowException");
s390_lgfr (code, ins->dreg, ins->dreg);
} else {
s390_ltr (code, s390_r1, ins->sreg1);
s390_jz (code, 0); CODEPTR(code, o[0]);
s390_ltr (code, s390_r0, ins->sreg2);
s390_jnz (code, 6);
s390_lhi (code, s390_r1, 0);
s390_j (code, 0); CODEPTR(code, o[1]);
s390_xr (code, s390_r0, s390_r1);
s390_msr (code, s390_r1, ins->sreg2);
s390_xr (code, s390_r0, s390_r1);
s390_srl (code, s390_r0, 0, 31);
s390_ltr (code, s390_r0, s390_r0);
EMIT_COND_SYSTEM_EXCEPTION (S390_CC_NZ, "OverflowException");
PTRSLOT (code, o[0]);
PTRSLOT (code, o[1]);
s390_lgfr (code, ins->dreg, s390_r1);
}
}
break;
case OP_IMUL_OVF_UN: {
Expand Down Expand Up @@ -3516,21 +3522,21 @@ mono_arch_output_basic_block (MonoCompile *cfg, MonoBasicBlock *bb)
s390_lgr (code, s390_r1, ins->sreg1);

/*
* If the IMT/RGCTX register is in use then don't restore over it
* We have to restore R6, so it cannot be used as argument register.
* This is ensured by mono_arch_tailcall_supported, but verify here.
*/
if ((call->used_iregs & (MONO_ARCH_RGCTX_REG << 1)) || (call->rgctx_reg))
s390_lgr (code, s390_r0, MONO_ARCH_RGCTX_REG);
g_assert (!(call->used_iregs & (1 << S390_LAST_ARG_REG)));

/*
* If R6 is used for a parameter then don't restore the other
* parameter registers are volatile
*/
if (call->used_iregs & (1 << 6))
s390_lmg (code, s390_r7, s390_r14, STK_BASE, S390_NONPARM_SAVE_OFFSET);
else
s390_lmg (code, s390_r6, s390_r14, STK_BASE, S390_REG_SAVE_OFFSET);
* Likewise for the IMT/RGCTX register
*/
g_assert (!(call->used_iregs & (1 << MONO_ARCH_RGCTX_REG)));
g_assert (!(call->rgctx_reg));

if ((call->used_iregs & (MONO_ARCH_RGCTX_REG << 1)) || (call->rgctx_reg))
s390_lgr (code, MONO_ARCH_RGCTX_REG, s390_r0);
/*
* Restore all general registers
*/
s390_lmg (code, s390_r6, s390_r14, STK_BASE, S390_REG_SAVE_OFFSET);

/*
* Restore any FP registers that have been altered
Expand Down Expand Up @@ -4147,17 +4153,21 @@ mono_arch_output_basic_block (MonoCompile *cfg, MonoBasicBlock *bb)
}
}
break;
case OP_ICONV_TO_R4:
s390_cefbr (code, ins->dreg, ins->sreg1);
if (!cfg->r4fp)
s390_ldebr (code, ins->dreg, ins->dreg);
break;
case OP_LCONV_TO_R4:
case OP_ICONV_TO_R4: {
s390_cegbr (code, ins->dreg, ins->sreg1);
if (!cfg->r4fp)
s390_ldebr (code, ins->dreg, ins->dreg);
}
break;
case OP_ICONV_TO_R8:
s390_cdfbr (code, ins->dreg, ins->sreg1);
break;
case OP_LCONV_TO_R8:
case OP_ICONV_TO_R8: {
s390_cdgbr (code, ins->dreg, ins->sreg1);
}
break;
case OP_FCONV_TO_I1:
s390_cgdbr (code, ins->dreg, 5, ins->sreg1);
Expand Down Expand Up @@ -4581,7 +4591,12 @@ mono_arch_output_basic_block (MonoCompile *cfg, MonoBasicBlock *bb)
case OP_ATOMIC_ADD_I8: {
if (mono_hwcap_s390x_has_ia) {
s390_laag(code, s390_r0, ins->sreg2, ins->inst_basereg, ins->inst_offset);
s390_lg (code, ins->dreg, 0, ins->inst_basereg, ins->inst_offset);
if (mono_hwcap_s390x_has_mlt) {
s390_agrk(code, ins->dreg, s390_r0, ins->sreg2);
} else {
s390_agr (code, s390_r0, ins->sreg2);
s390_lgr (code, ins->dreg, s390_r0);
}
} else {
s390_lgr (code, s390_r1, ins->sreg2);
s390_lg (code, s390_r0, 0, ins->inst_basereg, ins->inst_offset);
Expand All @@ -4602,7 +4617,8 @@ mono_arch_output_basic_block (MonoCompile *cfg, MonoBasicBlock *bb)
case OP_ATOMIC_ADD_I4: {
if (mono_hwcap_s390x_has_ia) {
s390_laa (code, s390_r0, ins->sreg2, ins->inst_basereg, ins->inst_offset);
s390_lgf (code, ins->dreg, 0, ins->inst_basereg, ins->inst_offset);
s390_ar (code, s390_r0, ins->sreg2);
s390_lgfr(code, ins->dreg, s390_r0);
} else {
s390_lgfr(code, s390_r1, ins->sreg2);
s390_lgf (code, s390_r0, 0, ins->inst_basereg, ins->inst_offset);
Expand Down Expand Up @@ -6920,6 +6936,13 @@ mono_arch_tailcall_supported (MonoCompile *cfg, MonoMethodSignature *caller_sig,
for (int i = 0; res && i < callee_sig->param_count; ++i) {
switch(ainfo[i].regtype) {
case RegTypeGeneral :
// R6 is both used as argument register and call-saved
// This means we cannot use a tail call if R6 is needed
if (ainfo[i].reg == S390_LAST_ARG_REG)
res = FALSE;
else
res = TRUE;
break;
case RegTypeFP :
case RegTypeFPR4 :
case RegTypeStructByValInFP :
Expand Down
2 changes: 1 addition & 1 deletion src/mono/mono/mini/mini-s390x.h
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ struct SeqPointInfo {

// Does the ABI have a volatile non-parameter register, so tailcall
// can pass context to generics or interfaces?
#define MONO_ARCH_HAVE_VOLATILE_NON_PARAM_REGISTER 1 // FIXME?
#define MONO_ARCH_HAVE_VOLATILE_NON_PARAM_REGISTER 0 // FIXME?

/*-----------------------------------------------*/
/* Macros used to generate instructions */
Expand Down
2 changes: 0 additions & 2 deletions src/mono/mono/mini/tramp-s390x.c
Original file line number Diff line number Diff line change
Expand Up @@ -183,8 +183,6 @@ mono_arch_create_sdb_trampoline (gboolean single_step, MonoTrampInfo **info, gbo
/* Initialize a MonoContext structure on the stack */
s390_stmg (code, s390_r0, s390_r14, STK_BASE, gr_offset);
s390_stg (code, s390_r1, 0, STK_BASE, sp_offset);
sp_offset = ctx_offset + G_STRUCT_OFFSET(MonoContext, uc_stack.ss_sp);
s390_stg (code, s390_r1, 0, STK_BASE, sp_offset);
s390_stg (code, s390_r14, 0, STK_BASE, ip_offset);

fp_offset = ctx_offset + G_STRUCT_OFFSET(MonoContext, uc_mcontext.fpregs.fprs);
Expand Down
1 change: 0 additions & 1 deletion src/mono/mono/utils/mono-context.h
Original file line number Diff line number Diff line change
Expand Up @@ -914,7 +914,6 @@ typedef struct ucontext MonoContext;
#define MONO_CONTEXT_SET_BP(ctx,bp) \
do { \
(ctx)->uc_mcontext.gregs[15] = (unsigned long)bp; \
(ctx)->uc_stack.ss_sp = (void*)bp; \
} while (0)

#define MONO_CONTEXT_GET_IP(ctx) (gpointer) (ctx)->uc_mcontext.psw.addr
Expand Down
1 change: 0 additions & 1 deletion src/mono/mono/utils/mono-sigcontext.h
Original file line number Diff line number Diff line change
Expand Up @@ -541,7 +541,6 @@ typedef struct ucontext
# define UCONTEXT_GREGS(ctx) (((ucontext_t *)(ctx))->uc_mcontext.gregs)
# define UCONTEXT_FREGS(ctx) (((ucontext_t *)(ctx))->uc_mcontext.fpregs->fprs)
# define UCONTEXT_REG_Rn(ctx, n) (((ucontext_t *)(ctx))->uc_mcontext.gregs[(n)])
# define UCONTEXT_SP(ctx) (((ucontext_t *)(ctx))->uc_stack.ss_sp)
# define UCONTEXT_IP(ctx) (((ucontext_t *)(ctx))->uc_mcontext.psw.addr)

#endif
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,14 @@ public partial class Array
private class RawData
{
public IntPtr Bounds;
public IntPtr Count;
#if MONO_BIG_ARRAYS
public ulong Count;
#else
public uint Count;
#if !ARCH_32
private uint _Pad;
#endif
#endif
public byte Data;
}

Expand Down

0 comments on commit 6246c5e

Please sign in to comment.