Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranelift: mov not removed when lowering select #3744

Closed
abrown opened this issue Jan 29, 2022 · 6 comments · Fixed by #4095
Closed

Cranelift: mov not removed when lowering select #3744

abrown opened this issue Jan 29, 2022 · 6 comments · Fixed by #4095
Labels
bug Incorrect behavior in the current implementation that needs fixing cranelift Issues related to the Cranelift code generator

Comments

@abrown
Copy link
Contributor

abrown commented Jan 29, 2022

.clif Test Case

test compile
target x86_64

function %select_eq_f32(f32, f32) -> i32 {
block0(v0: f32, v1: f32):
    v2 = fcmp eq v0, v1
    v3 = iconst.i32 1
    v4 = iconst.i32 0
    v5 = select v2, v3, v4
    return v5
}

Steps to Reproduce

RUST_LOG=cranelift_codegen::machinst::compile=trace cargo run -p cranelift-tools -- compile -dDp --target x86_64 scratch.clif

Expected Results

No extra movl instruction after the ucomiss comparison.

Actual Results

There is no need for instruction 7, movl %edi, %edi, but for some reason it is not being elided.

 TRACE cranelift_codegen::machinst::compile > vcode after regalloc: final version:
VCode_ShowWithRRU {{
  Entry block: 0
Block 0:
  (original IR block: block0)
  (instruction range: 0 .. 13)
  Inst 0:   pushq   %rbp
  Inst 1:   unwind PushFrameRegs { offset_upward_to_caller_sp: 16 }
  Inst 2:   movq    %rsp, %rbp
  Inst 3:   unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 }
  Inst 4:   movl    $1, %edi
  Inst 5:   xorl    %esi, %esi
  Inst 6:   ucomiss %xmm0, %xmm1
  Inst 7:   movl    %edi, %edi
  Inst 8:   cmovnzl %esi, %edi; cmovpl  %esi, %edi
  Inst 9:   movq    %rdi, %rax
  Inst 10:   movq    %rbp, %rsp
  Inst 11:   popq    %rbp
  Inst 12:   ret
}}

Versions and Environment

Cranelift version or commit: this redundant instruction is present in the ISLE changes to select currently under review at #3682.

@abrown abrown added bug Incorrect behavior in the current implementation that needs fixing cranelift Issues related to the Cranelift code generator labels Jan 29, 2022
@fitzgen
Copy link
Member

fitzgen commented Jan 31, 2022

As we discussed, this is regalloc failing to coalesce/elide the mov in this case. Funny because it is the "easiest" kind of move to elide. Should be fixed with regalloc2, or at least once we switch to regalloc2 we can make sure that it handles this and update regalloc2 if it doesn't.

@alexcrichton
Copy link
Member

The current output is:

Disassembly of 28 bytes:
   0:   55                      push    rbp
   1:   48 89 e5                mov     rbp, rsp
   4:   45 31 c0                xor     r8d, r8d
   7:   b8 01 00 00 00          mov     eax, 1
   c:   0f 2e c8                ucomiss xmm1, xmm0
   f:   41 0f 45 c0             cmovne  eax, r8d
  13:   41 0f 4a c0             cmovp   eax, r8d
  17:   48 89 ec                mov     rsp, rbp
  1a:   5d                      pop     rbp
  1b:   c3                      ret

which I think means that this is fixed. Is this something we'll want a test for though before closing?

@abrown
Copy link
Contributor Author

abrown commented May 3, 2022

Yeah, I guess we could add a CLIF compile test for this but I think that if we use the debug output it could be unwittingly overwritten in a BLESSed run and this issue could resurface. We can still use test compile instead of test compile precise-output, I think.

@cfallin
Copy link
Member

cfallin commented May 3, 2022

if we use the debug output it could be unwittingly overwritten in a BLESSed run and this issue could resurface

Indeed, this was one of my concerns with the auto-blessing (though to be clear it's a big convenience improvement overall, so I don't think it was a mistake). I think the right answer is probably to use non-autoblessed tests for cases where we're asserting a certain property about the output, and possibly be a bit looser in the matching on details we don't care about (prologue/epilogue, specific registers), then reduce the number of precise-output tests overall (and make heavier use of run-tests).

@abrown
Copy link
Contributor Author

abrown commented May 3, 2022

I can add a test compile later today if no one else wants to take this?

@cfallin
Copy link
Member

cfallin commented May 3, 2022

Go for it, happy to review, and thank you!

abrown added a commit to abrown/wasmtime that referenced this issue May 3, 2022
In bytecodealliance#3744, we identified that extra `mov` instructions were inserted in
between the `cmov` instructions that CLIF's `select` lowers to. The
switch to regalloc2 resolved this and this test checks that no
intervening `mov`s are inserted. Closes bytecodealliance#3744.
abrown added a commit to abrown/wasmtime that referenced this issue May 3, 2022
In bytecodealliance#3744, we identified that extra `mov` instructions were inserted in
between the `cmov` instructions that CLIF's `select` lowers to. The
switch to regalloc2 resolved this and this test checks that no
intervening `mov`s are inserted. Closes bytecodealliance#3744.
abrown added a commit that referenced this issue May 3, 2022
In #3744, we identified that extra `mov` instructions were inserted in
between the `cmov` instructions that CLIF's `select` lowers to. The
switch to regalloc2 resolved this and this test checks that no
intervening `mov`s are inserted. Closes #3744.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior in the current implementation that needs fixing cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants