x64 bugfix: prevent load-op fusion of cmp because it could be emitted multiple times. #2576

cfallin · 2021-01-12T23:58:09Z

On x64, the new backend generates cmp instructions at their use-sites
when possible (when the icmp that generates a boolean is known) so that
the condition flows directly through flags rather than a materialized
boolean. E.g., both bint (boolean to int) and select (conditional
select) instruction lowerings invoke emit_cmp() to do so.

Load-op fusion in emit_cmp() nominally allowed cmp to use its cmp reg, mem form.

However, the mergeable-load condition (load has only single use) was not
adequately checked. Consider the sequence:

    v2 = load.i64 v1
    v3 = icmp eq v0, v2
    v4 = bint.i64 v3
    v5 = select.i64 v3, v0, v1

The load v2 is only used in the icmp at v3. However, the cmp will
be separately codegen'd twice, once for the bint and once for the
select.

Prior to this fix, the above example would result in the load at v2
sinking to the cmp just above the select; we then emit another cmp
for the bint, but the load has already been used once so we do not
allow merging. We thus (i) expect the register for v2 to contain the
loaded value, but (ii) skip the codegen for the load because it has been
sunk. This results in a regalloc error (unexpected livein) as the
unfilled register is upward-exposed to the entry point.

Because of this, we need to accept only the reg, reg form in
emit_cmp() (and the FP equivalent). We could get marginally better
code by tracking whether the cmp we are emitting comes from an
icmp/fcmp with only one use; but IMHO simplicity is a better rule
here when subtle interactions occur.

cranelift/codegen/src/isa/x64/lower.rs

… multiple times. On x64, the new backend generates `cmp` instructions at their use-sites when possible (when the icmp that generates a boolean is known) so that the condition flows directly through flags rather than a materialized boolean. E.g., both `bint` (boolean to int) and `select` (conditional select) instruction lowerings invoke `emit_cmp()` to do so. Load-op fusion in `emit_cmp()` nominally allowed `cmp` to use its `cmp reg, mem` form. However, the mergeable-load condition (load has only single use) was not adequately checked. Consider the sequence: ``` v2 = load.i64 v1 v3 = icmp eq v0, v2 v4 = bint.i64 v3 v5 = select.i64 v3, v0, v1 ``` The load `v2` is only used in the `icmp` at `v3`. However, the cmp will be separately codegen'd twice, once for the `bint` and once for the `select`. Prior to this fix, the above example would result in the load at `v2` sinking to the `cmp` just above the `select`; we then emit another `cmp` for the `bint`, but the load has already been used once so we do not allow merging. We thus (i) expect the register for `v2` to contain the loaded value, but (ii) skip the codegen for the load because it has been sunk. This results in a regalloc error (unexpected livein) as the unfilled register is upward-exposed to the entry point. Because of this, we need to accept only the reg, reg form in `emit_cmp()` (and the FP equivalent). We could get marginally better code by tracking whether the `cmp` we are emitting comes from an `icmp`/`fcmp` with only one use; but IMHO simplicity is a better rule here when subtle interactions occur.

Update wasmtime dependency to include bytecodealliance/wasmtime#2576.

The `fpcmp` helper in the x64 backend uses `put_in_xmm_mem` for one of its operands, which allows the compiler to merge a load with the compare instruction (`ucomiss` or `ucomisd`). Unfortunately, as we saw in bytecodealliance#2576 for the integer-compare case, this does not work with our lowering algorithm because compares can be lowered more than once (unlike all other instructions) to reproduce the flags where needed. Merging a load into an op that executes more than once is invalid in general (the two loads may observe different values, which violates the original program semantics because there was only one load originally). This does not result in a miscompilation, but instead will cause a panic at regalloc time because the register that should have been defined by the separate load is never written (the load is never emitted separately). I think this (very subtle, easy to miss) condition was unfortunately not ported over when we moved the logic in bytecodealliance#3682. The existing fcmp-of-load test in `cmp-mem-bug` (from bytecodealliance#2576) does not seem to trigger it, for a reason I haven't fully deduced. I just added the verbatim function body (happens to come from `clang.wasm`) that triggers the bug as a test. Discovered while bringing up regalloc2 support. It's pretty unlikely to hit by chance, which is why I think none of our fuzzing has hit it yet.

The `fpcmp` helper in the x64 backend uses `put_in_xmm_mem` for one of its operands, which allows the compiler to merge a load with the compare instruction (`ucomiss` or `ucomisd`). Unfortunately, as we saw in #2576 for the integer-compare case, this does not work with our lowering algorithm because compares can be lowered more than once (unlike all other instructions) to reproduce the flags where needed. Merging a load into an op that executes more than once is invalid in general (the two loads may observe different values, which violates the original program semantics because there was only one load originally). This does not result in a miscompilation, but instead will cause a panic at regalloc time because the register that should have been defined by the separate load is never written (the load is never emitted separately). I think this (very subtle, easy to miss) condition was unfortunately not ported over when we moved the logic in #3682. The existing fcmp-of-load test in `cmp-mem-bug` (from #2576) does not seem to trigger it, for a reason I haven't fully deduced. I just added the verbatim function body (happens to come from `clang.wasm`) that triggers the bug as a test. Discovered while bringing up regalloc2 support. It's pretty unlikely to hit by chance, which is why I think none of our fuzzing has hit it yet.

The `fpcmp` helper in the x64 backend uses `put_in_xmm_mem` for one of its operands, which allows the compiler to merge a load with the compare instruction (`ucomiss` or `ucomisd`). Unfortunately, as we saw in bytecodealliance#2576 for the integer-compare case, this does not work with our lowering algorithm because compares can be lowered more than once (unlike all other instructions) to reproduce the flags where needed. Merging a load into an op that executes more than once is invalid in general (the two loads may observe different values, which violates the original program semantics because there was only one load originally). This does not result in a miscompilation, but instead will cause a panic at regalloc time because the register that should have been defined by the separate load is never written (the load is never emitted separately). I think this (very subtle, easy to miss) condition was unfortunately not ported over when we moved the logic in bytecodealliance#3682. The existing fcmp-of-load test in `cmp-mem-bug` (from bytecodealliance#2576) does not seem to trigger it, for a reason I haven't fully deduced. I just added the verbatim function body (happens to come from `clang.wasm`) that triggers the bug as a test. Discovered while bringing up regalloc2 support. It's pretty unlikely to hit by chance, which is why I think none of our fuzzing has hit it yet.

The `fpcmp` helper in the x64 backend uses `put_in_xmm_mem` for one of its operands, which allows the compiler to merge a load with the compare instruction (`ucomiss` or `ucomisd`). Unfortunately, as we saw in #2576 for the integer-compare case, this does not work with our lowering algorithm because compares can be lowered more than once (unlike all other instructions) to reproduce the flags where needed. Merging a load into an op that executes more than once is invalid in general (the two loads may observe different values, which violates the original program semantics because there was only one load originally). This does not result in a miscompilation, but instead will cause a panic at regalloc time because the register that should have been defined by the separate load is never written (the load is never emitted separately). I think this (very subtle, easy to miss) condition was unfortunately not ported over when we moved the logic in #3682. The existing fcmp-of-load test in `cmp-mem-bug` (from #2576) does not seem to trigger it, for a reason I haven't fully deduced. I just added the verbatim function body (happens to come from `clang.wasm`) that triggers the bug as a test. Discovered while bringing up regalloc2 support. It's pretty unlikely to hit by chance, which is why I think none of our fuzzing has hit it yet.

cfallin requested review from fitzgen and abrown January 12, 2021 23:58

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels Jan 13, 2021

fitzgen approved these changes Jan 13, 2021

View reviewed changes

cranelift/codegen/src/isa/x64/lower.rs Show resolved Hide resolved

cfallin force-pushed the cmp-load-bug branch from 64f5130 to 4638de6 Compare January 13, 2021 17:49

cfallin merged commit 2b2f369 into bytecodealliance:main Jan 13, 2021

cfallin added a commit to bytecodealliance/lucet that referenced this pull request Jan 13, 2021

Update wasmtime dependency to include bytecodealliance/wasmtime#2576.

63e34f2

cfallin added a commit to bytecodealliance/lucet that referenced this pull request Jan 13, 2021

Merge pull request #624 from bytecodealliance/cfallin/update-wasmtime

049e296

Update wasmtime dependency to include bytecodealliance/wasmtime#2576.

cfallin mentioned this pull request Jan 27, 2021

Implement limiting WebAssembly execution with fuel #2611

Merged

cfallin mentioned this pull request May 7, 2021

Support IBM z/Architecture #2874

Merged

cfallin mentioned this pull request Mar 16, 2022

x64 backend: fix fpcmp to avoid load-op merging. #3934

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x64 bugfix: prevent load-op fusion of cmp because it could be emitted multiple times. #2576

x64 bugfix: prevent load-op fusion of cmp because it could be emitted multiple times. #2576

cfallin commented Jan 12, 2021

x64 bugfix: prevent load-op fusion of cmp because it could be emitted multiple times. #2576

x64 bugfix: prevent load-op fusion of cmp because it could be emitted multiple times. #2576

Conversation

cfallin commented Jan 12, 2021