Allow loads to merge into other operations during instruction selection in MachInst backends #2340

cfallin · 2020-10-30T00:17:52Z

In MachInst-based backends, instruction selection works by pattern-matching on a root opcode and potentially the opcodes that produce its inputs (the "operand tree"). Sometimes, a particular machine backend is able to pattern-match several opcodes in a certain arrangement and generate one machine instruction that covers them all (e.g.: multiply-add).

Some machine architectures have the ability to load one operand of an operation from memory. For example, on x86, we can do add rax, [rbx], which is a combination load and add (in that order).

Unfortunately, because of a (sound but conservative) design in our pattern-matching system, we cannot match patterns where the load is an operand rather than the root, because we do not allow operations that have side-effects or touch memory to move in the program. This is because reasoning about when it is safe to do so is complex. (Instruction coloring is the framework that does this reasoning, but because the color changes after the load or side-effecting op, its own color differs from that of every program point below it, so it will never merge.)

Consider the following test-case as an interesting example:

block1(v0: i32, v1: i32):
v2 = sload.i32 v0
v3 = sload.i32 v1
v4 = iadd.i32 v2, v0
v5 = iadd.i32 v3, v1

We want this to lower to two add-from-memory instructions on x86. Working bottom-up (as VCode lowering does), we first see v5 and perhaps reason that we can sink the load v3 because it does not move across any other memory ops. We would also like to do this for v2 into v4, and in this example we can, but only after we sink v3; the load producing v2 moves across where v3 used to be. Hence, a naïve coloring scheme won't necessarily work.

A solution that @julian-seward1 and I have converged on involves numbering only side-effecting ops with colors (or sequence numbers), and while scanning backward through a basic block, tracking the "current sequence number", or in other words, the next number whose op we expect to see. We permit a merge of any load with this sequence number if (i) it's in the same BB and (ii) this use is its only use; then we decrement the sequence number.

This implements the very simple rule that a load or store never moves over another load or store; but as we sink one, we create space for the one above it to sink, as well.

(Later we might want to extend this with multiple sequence numbers, one per disjoint unit of state, sort of in the spirit of a vector clock; but details there are TBD.)

The text was updated successfully, but these errors were encountered:

cfallin · 2020-10-30T00:19:44Z

cc @akirilov-arm and @abrown: this should enable us to properly handle the load-merging that LoadSplat addresses in #2278.

…ons. This patch implements, for aarch64, the following wasm SIMD extensions. v128.load32_zero and v128.load64_zero instructions WebAssembly/simd#237 The changes are straightforward: * no new CLIF instructions. They are translated into an existing CLIF scalar load followed by a CLIF `scalar_to_vector`. * the comment/specification for CLIF `scalar_to_vector` has been changed to match the actual intended semantics, per consulation with Andrew Brown. * translation from `scalar_to_vector` to the obvious aarch64 insns. * special-case zero in `lower_constant_f128` in order to avoid a potentially slow call to `Inst::load_fp_constant128`. * Once "Allow loads to merge into other operations during instruction selection in MachInst backends" (bytecodealliance#2340) lands, we can use that functionality to pattern match the two-CLIF pair and emit a single AArch64 instruction. There is no testcase in this commit, because that is a separate repo. The implementation has been tested, nevertheless.

…ons. This patch implements, for aarch64, the following wasm SIMD extensions. v128.load32_zero and v128.load64_zero instructions WebAssembly/simd#237 The changes are straightforward: * no new CLIF instructions. They are translated into an existing CLIF scalar load followed by a CLIF `scalar_to_vector`. * the comment/specification for CLIF `scalar_to_vector` has been changed to match the actual intended semantics, per consulation with Andrew Brown. * translation from `scalar_to_vector` to aarch64 `fmov` instruction. This has been generalised slightly so as to allow both 32- and 64-bit transfers. * special-case zero in `lower_constant_f128` in order to avoid a potentially slow call to `Inst::load_fp_constant128`. * Once "Allow loads to merge into other operations during instruction selection in MachInst backends" (bytecodealliance#2340) lands, we can use that functionality to pattern match the two-CLIF pair and emit a single AArch64 instruction. * A simple filetest has been added. There is no comprehensive testcase in this commit, because that is a separate repo. The implementation has been tested, nevertheless.

…ons. This patch implements, for aarch64, the following wasm SIMD extensions. v128.load32_zero and v128.load64_zero instructions WebAssembly/simd#237 The changes are straightforward: * no new CLIF instructions. They are translated into an existing CLIF scalar load followed by a CLIF `scalar_to_vector`. * the comment/specification for CLIF `scalar_to_vector` has been changed to match the actual intended semantics, per consulation with Andrew Brown. * translation from `scalar_to_vector` to aarch64 `fmov` instruction. This has been generalised slightly so as to allow both 32- and 64-bit transfers. * special-case zero in `lower_constant_f128` in order to avoid a potentially slow call to `Inst::load_fp_constant128`. * Once "Allow loads to merge into other operations during instruction selection in MachInst backends" (#2340) lands, we can use that functionality to pattern match the two-CLIF pair and emit a single AArch64 instruction. * A simple filetest has been added. There is no comprehensive testcase in this commit, because that is a separate repo. The implementation has been tested, nevertheless.

cfallin self-assigned this Oct 30, 2020

cfallin added the cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. label Oct 30, 2020

julian-seward1 mentioned this issue Nov 3, 2020

CL/aarch64: implement the wasm SIMD v128.load{32,64}_zero instructi… #2355

Merged

cfallin mentioned this issue Nov 5, 2020

MachInst lowering logic: allow effectful instructions to merge. #2366

Merged

cfallin closed this as completed in #2366 Nov 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow loads to merge into other operations during instruction selection in MachInst backends #2340

Allow loads to merge into other operations during instruction selection in MachInst backends #2340

cfallin commented Oct 30, 2020

cfallin commented Oct 30, 2020

Uh oh!

Allow loads to merge into other operations during instruction selection in MachInst backends #2340

Allow loads to merge into other operations during instruction selection in MachInst backends #2340

Comments

cfallin commented Oct 30, 2020

cfallin commented Oct 30, 2020

Uh oh!