Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x64 bugfix: prevent load-op fusion of cmp because it could be emitted multiple times. #2576

Merged
merged 1 commit into from
Jan 13, 2021

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Jan 12, 2021

On x64, the new backend generates cmp instructions at their use-sites
when possible (when the icmp that generates a boolean is known) so that
the condition flows directly through flags rather than a materialized
boolean. E.g., both bint (boolean to int) and select (conditional
select) instruction lowerings invoke emit_cmp() to do so.

Load-op fusion in emit_cmp() nominally allowed cmp to use its cmp reg, mem form.

However, the mergeable-load condition (load has only single use) was not
adequately checked. Consider the sequence:

    v2 = load.i64 v1
    v3 = icmp eq v0, v2
    v4 = bint.i64 v3
    v5 = select.i64 v3, v0, v1

The load v2 is only used in the icmp at v3. However, the cmp will
be separately codegen'd twice, once for the bint and once for the
select.

Prior to this fix, the above example would result in the load at v2
sinking to the cmp just above the select; we then emit another cmp
for the bint, but the load has already been used once so we do not
allow merging. We thus (i) expect the register for v2 to contain the
loaded value, but (ii) skip the codegen for the load because it has been
sunk. This results in a regalloc error (unexpected livein) as the
unfilled register is upward-exposed to the entry point.

Because of this, we need to accept only the reg, reg form in
emit_cmp() (and the FP equivalent). We could get marginally better
code by tracking whether the cmp we are emitting comes from an
icmp/fcmp with only one use; but IMHO simplicity is a better rule
here when subtle interactions occur.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels Jan 13, 2021
… multiple times.

On x64, the new backend generates `cmp` instructions at their use-sites
when possible (when the icmp that generates a boolean is known) so that
the condition flows directly through flags rather than a materialized
boolean. E.g., both `bint` (boolean to int) and `select` (conditional
select) instruction lowerings invoke `emit_cmp()` to do so.

Load-op fusion in `emit_cmp()` nominally allowed `cmp` to use its `cmp
reg, mem` form.

However, the mergeable-load condition (load has only single use) was not
adequately checked. Consider the sequence:

```
    v2 = load.i64 v1
    v3 = icmp eq v0, v2
    v4 = bint.i64 v3
    v5 = select.i64 v3, v0, v1
```

The load `v2` is only used in the `icmp` at `v3`. However, the cmp will
be separately codegen'd twice, once for the `bint` and once for the
`select`.

Prior to this fix, the above example would result in the load at `v2`
sinking to the `cmp` just above the `select`; we then emit another `cmp`
for the `bint`, but the load has already been used once so we do not
allow merging. We thus (i) expect the register for `v2` to contain the
loaded value, but (ii) skip the codegen for the load because it has been
sunk. This results in a regalloc error (unexpected livein) as the
unfilled register is upward-exposed to the entry point.

Because of this, we need to accept only the reg, reg form in
`emit_cmp()` (and the FP equivalent). We could get marginally better
code by tracking whether the `cmp` we are emitting comes from an
`icmp`/`fcmp` with only one use; but IMHO simplicity is a better rule
here when subtle interactions occur.
@cfallin cfallin merged commit 2b2f369 into bytecodealliance:main Jan 13, 2021
cfallin added a commit to bytecodealliance/lucet that referenced this pull request Jan 13, 2021
cfallin added a commit to bytecodealliance/lucet that referenced this pull request Jan 13, 2021
cfallin added a commit to cfallin/wasmtime that referenced this pull request Mar 16, 2022
The `fpcmp` helper in the x64 backend uses `put_in_xmm_mem` for one of
its operands, which allows the compiler to merge a load with the compare
instruction (`ucomiss` or `ucomisd`).

Unfortunately, as we saw in bytecodealliance#2576 for the integer-compare case, this
does not work with our lowering algorithm because compares can be
lowered more than once (unlike all other instructions) to reproduce the
flags where needed. Merging a load into an op that executes more than
once is invalid in general (the two loads may observe different values,
which violates the original program semantics because there was only one
load originally).

This does not result in a miscompilation, but instead will cause a panic
at regalloc time because the register that should have been defined by
the separate load is never written (the load is never emitted
separately).

I think this (very subtle, easy to miss) condition was unfortunately not
ported over when we moved the logic in bytecodealliance#3682.

The existing fcmp-of-load test in `cmp-mem-bug` (from bytecodealliance#2576) does not
seem to trigger it, for a reason I haven't fully deduced. I just added
the verbatim function body (happens to come from `clang.wasm`) that
triggers the bug as a test.

Discovered while bringing up regalloc2 support. It's pretty unlikely to
hit by chance, which is why I think none of our fuzzing has hit it yet.
abrown pushed a commit that referenced this pull request Mar 16, 2022
The `fpcmp` helper in the x64 backend uses `put_in_xmm_mem` for one of
its operands, which allows the compiler to merge a load with the compare
instruction (`ucomiss` or `ucomisd`).

Unfortunately, as we saw in #2576 for the integer-compare case, this
does not work with our lowering algorithm because compares can be
lowered more than once (unlike all other instructions) to reproduce the
flags where needed. Merging a load into an op that executes more than
once is invalid in general (the two loads may observe different values,
which violates the original program semantics because there was only one
load originally).

This does not result in a miscompilation, but instead will cause a panic
at regalloc time because the register that should have been defined by
the separate load is never written (the load is never emitted
separately).

I think this (very subtle, easy to miss) condition was unfortunately not
ported over when we moved the logic in #3682.

The existing fcmp-of-load test in `cmp-mem-bug` (from #2576) does not
seem to trigger it, for a reason I haven't fully deduced. I just added
the verbatim function body (happens to come from `clang.wasm`) that
triggers the bug as a test.

Discovered while bringing up regalloc2 support. It's pretty unlikely to
hit by chance, which is why I think none of our fuzzing has hit it yet.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Apr 11, 2022
The `fpcmp` helper in the x64 backend uses `put_in_xmm_mem` for one of
its operands, which allows the compiler to merge a load with the compare
instruction (`ucomiss` or `ucomisd`).

Unfortunately, as we saw in bytecodealliance#2576 for the integer-compare case, this
does not work with our lowering algorithm because compares can be
lowered more than once (unlike all other instructions) to reproduce the
flags where needed. Merging a load into an op that executes more than
once is invalid in general (the two loads may observe different values,
which violates the original program semantics because there was only one
load originally).

This does not result in a miscompilation, but instead will cause a panic
at regalloc time because the register that should have been defined by
the separate load is never written (the load is never emitted
separately).

I think this (very subtle, easy to miss) condition was unfortunately not
ported over when we moved the logic in bytecodealliance#3682.

The existing fcmp-of-load test in `cmp-mem-bug` (from bytecodealliance#2576) does not
seem to trigger it, for a reason I haven't fully deduced. I just added
the verbatim function body (happens to come from `clang.wasm`) that
triggers the bug as a test.

Discovered while bringing up regalloc2 support. It's pretty unlikely to
hit by chance, which is why I think none of our fuzzing has hit it yet.
cfallin added a commit that referenced this pull request Apr 11, 2022
The `fpcmp` helper in the x64 backend uses `put_in_xmm_mem` for one of
its operands, which allows the compiler to merge a load with the compare
instruction (`ucomiss` or `ucomisd`).

Unfortunately, as we saw in #2576 for the integer-compare case, this
does not work with our lowering algorithm because compares can be
lowered more than once (unlike all other instructions) to reproduce the
flags where needed. Merging a load into an op that executes more than
once is invalid in general (the two loads may observe different values,
which violates the original program semantics because there was only one
load originally).

This does not result in a miscompilation, but instead will cause a panic
at regalloc time because the register that should have been defined by
the separate load is never written (the load is never emitted
separately).

I think this (very subtle, easy to miss) condition was unfortunately not
ported over when we moved the logic in #3682.

The existing fcmp-of-load test in `cmp-mem-bug` (from #2576) does not
seem to trigger it, for a reason I haven't fully deduced. I just added
the verbatim function body (happens to come from `clang.wasm`) that
triggers the bug as a test.

Discovered while bringing up regalloc2 support. It's pretty unlikely to
hit by chance, which is why I think none of our fuzzing has hit it yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants