Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow loads to merge into other operations during instruction selection in MachInst backends #2340

Closed
cfallin opened this issue Oct 30, 2020 · 1 comment · Fixed by #2366
Closed
Assignees
Labels
cranelift:area:machinst Issues related to instruction selection and the new MachInst backend.

Comments

@cfallin
Copy link
Member

cfallin commented Oct 30, 2020

In MachInst-based backends, instruction selection works by pattern-matching on a root opcode and potentially the opcodes that produce its inputs (the "operand tree"). Sometimes, a particular machine backend is able to pattern-match several opcodes in a certain arrangement and generate one machine instruction that covers them all (e.g.: multiply-add).

Some machine architectures have the ability to load one operand of an operation from memory. For example, on x86, we can do add rax, [rbx], which is a combination load and add (in that order).

Unfortunately, because of a (sound but conservative) design in our pattern-matching system, we cannot match patterns where the load is an operand rather than the root, because we do not allow operations that have side-effects or touch memory to move in the program. This is because reasoning about when it is safe to do so is complex. (Instruction coloring is the framework that does this reasoning, but because the color changes after the load or side-effecting op, its own color differs from that of every program point below it, so it will never merge.)

Consider the following test-case as an interesting example:

block1(v0: i32, v1: i32):
v2 = sload.i32 v0
v3 = sload.i32 v1
v4 = iadd.i32 v2, v0
v5 = iadd.i32 v3, v1

We want this to lower to two add-from-memory instructions on x86. Working bottom-up (as VCode lowering does), we first see v5 and perhaps reason that we can sink the load v3 because it does not move across any other memory ops. We would also like to do this for v2 into v4, and in this example we can, but only after we sink v3; the load producing v2 moves across where v3 used to be. Hence, a naïve coloring scheme won't necessarily work.

A solution that @julian-seward1 and I have converged on involves numbering only side-effecting ops with colors (or sequence numbers), and while scanning backward through a basic block, tracking the "current sequence number", or in other words, the next number whose op we expect to see. We permit a merge of any load with this sequence number if (i) it's in the same BB and (ii) this use is its only use; then we decrement the sequence number.

This implements the very simple rule that a load or store never moves over another load or store; but as we sink one, we create space for the one above it to sink, as well.

(Later we might want to extend this with multiple sequence numbers, one per disjoint unit of state, sort of in the spirit of a vector clock; but details there are TBD.)

@cfallin
Copy link
Member Author

cfallin commented Oct 30, 2020

cc @akirilov-arm and @abrown: this should enable us to properly handle the load-merging that LoadSplat addresses in #2278.

@cfallin cfallin self-assigned this Oct 30, 2020
@cfallin cfallin added the cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. label Oct 30, 2020
julian-seward1 added a commit to julian-seward1/wasmtime that referenced this issue Nov 3, 2020
…ons.

This patch implements, for aarch64, the following wasm SIMD extensions.

  v128.load32_zero and v128.load64_zero instructions
  WebAssembly/simd#237

The changes are straightforward:

* no new CLIF instructions.  They are translated into an existing CLIF scalar
  load followed by a CLIF `scalar_to_vector`.

* the comment/specification for CLIF `scalar_to_vector` has been changed to
  match the actual intended semantics, per consulation with Andrew Brown.

* translation from `scalar_to_vector` to the obvious aarch64 insns.

* special-case zero in `lower_constant_f128` in order to avoid a
  potentially slow call to `Inst::load_fp_constant128`.

* Once "Allow loads to merge into other operations during instruction
  selection in MachInst backends"
  (bytecodealliance#2340) lands,
  we can use that functionality to pattern match the two-CLIF pair and
  emit a single AArch64 instruction.

There is no testcase in this commit, because that is a separate repo.  The
implementation has been tested, nevertheless.
julian-seward1 added a commit to julian-seward1/wasmtime that referenced this issue Nov 3, 2020
…ons.

This patch implements, for aarch64, the following wasm SIMD extensions.

  v128.load32_zero and v128.load64_zero instructions
  WebAssembly/simd#237

The changes are straightforward:

* no new CLIF instructions.  They are translated into an existing CLIF scalar
  load followed by a CLIF `scalar_to_vector`.

* the comment/specification for CLIF `scalar_to_vector` has been changed to
  match the actual intended semantics, per consulation with Andrew Brown.

* translation from `scalar_to_vector` to the obvious aarch64 insns.

* special-case zero in `lower_constant_f128` in order to avoid a
  potentially slow call to `Inst::load_fp_constant128`.

* Once "Allow loads to merge into other operations during instruction
  selection in MachInst backends"
  (bytecodealliance#2340) lands,
  we can use that functionality to pattern match the two-CLIF pair and
  emit a single AArch64 instruction.

There is no testcase in this commit, because that is a separate repo.  The
implementation has been tested, nevertheless.
julian-seward1 added a commit to julian-seward1/wasmtime that referenced this issue Nov 4, 2020
…ons.

This patch implements, for aarch64, the following wasm SIMD extensions.

  v128.load32_zero and v128.load64_zero instructions
  WebAssembly/simd#237

The changes are straightforward:

* no new CLIF instructions.  They are translated into an existing CLIF scalar
  load followed by a CLIF `scalar_to_vector`.

* the comment/specification for CLIF `scalar_to_vector` has been changed to
  match the actual intended semantics, per consulation with Andrew Brown.

* translation from `scalar_to_vector` to aarch64 `fmov` instruction.  This
  has been generalised slightly so as to allow both 32- and 64-bit transfers.

* special-case zero in `lower_constant_f128` in order to avoid a
  potentially slow call to `Inst::load_fp_constant128`.

* Once "Allow loads to merge into other operations during instruction
  selection in MachInst backends"
  (bytecodealliance#2340) lands,
  we can use that functionality to pattern match the two-CLIF pair and
  emit a single AArch64 instruction.

* A simple filetest has been added.

There is no comprehensive testcase in this commit, because that is a separate
repo.  The implementation has been tested, nevertheless.
julian-seward1 added a commit that referenced this issue Nov 4, 2020
…ons.

This patch implements, for aarch64, the following wasm SIMD extensions.

  v128.load32_zero and v128.load64_zero instructions
  WebAssembly/simd#237

The changes are straightforward:

* no new CLIF instructions.  They are translated into an existing CLIF scalar
  load followed by a CLIF `scalar_to_vector`.

* the comment/specification for CLIF `scalar_to_vector` has been changed to
  match the actual intended semantics, per consulation with Andrew Brown.

* translation from `scalar_to_vector` to aarch64 `fmov` instruction.  This
  has been generalised slightly so as to allow both 32- and 64-bit transfers.

* special-case zero in `lower_constant_f128` in order to avoid a
  potentially slow call to `Inst::load_fp_constant128`.

* Once "Allow loads to merge into other operations during instruction
  selection in MachInst backends"
  (#2340) lands,
  we can use that functionality to pattern match the two-CLIF pair and
  emit a single AArch64 instruction.

* A simple filetest has been added.

There is no comprehensive testcase in this commit, because that is a separate
repo.  The implementation has been tested, nevertheless.
cfallin pushed a commit that referenced this issue Nov 30, 2020
…ons.

This patch implements, for aarch64, the following wasm SIMD extensions.

  v128.load32_zero and v128.load64_zero instructions
  WebAssembly/simd#237

The changes are straightforward:

* no new CLIF instructions.  They are translated into an existing CLIF scalar
  load followed by a CLIF `scalar_to_vector`.

* the comment/specification for CLIF `scalar_to_vector` has been changed to
  match the actual intended semantics, per consulation with Andrew Brown.

* translation from `scalar_to_vector` to aarch64 `fmov` instruction.  This
  has been generalised slightly so as to allow both 32- and 64-bit transfers.

* special-case zero in `lower_constant_f128` in order to avoid a
  potentially slow call to `Inst::load_fp_constant128`.

* Once "Allow loads to merge into other operations during instruction
  selection in MachInst backends"
  (#2340) lands,
  we can use that functionality to pattern match the two-CLIF pair and
  emit a single AArch64 instruction.

* A simple filetest has been added.

There is no comprehensive testcase in this commit, because that is a separate
repo.  The implementation has been tested, nevertheless.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:machinst Issues related to instruction selection and the new MachInst backend.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant