-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow loads to merge into other operations during instruction selection in MachInst backends #2340
Labels
cranelift:area:machinst
Issues related to instruction selection and the new MachInst backend.
Comments
cc @akirilov-arm and @abrown: this should enable us to properly handle the load-merging that |
cfallin
added
the
cranelift:area:machinst
Issues related to instruction selection and the new MachInst backend.
label
Oct 30, 2020
julian-seward1
added a commit
to julian-seward1/wasmtime
that referenced
this issue
Nov 3, 2020
…ons. This patch implements, for aarch64, the following wasm SIMD extensions. v128.load32_zero and v128.load64_zero instructions WebAssembly/simd#237 The changes are straightforward: * no new CLIF instructions. They are translated into an existing CLIF scalar load followed by a CLIF `scalar_to_vector`. * the comment/specification for CLIF `scalar_to_vector` has been changed to match the actual intended semantics, per consulation with Andrew Brown. * translation from `scalar_to_vector` to the obvious aarch64 insns. * special-case zero in `lower_constant_f128` in order to avoid a potentially slow call to `Inst::load_fp_constant128`. * Once "Allow loads to merge into other operations during instruction selection in MachInst backends" (bytecodealliance#2340) lands, we can use that functionality to pattern match the two-CLIF pair and emit a single AArch64 instruction. There is no testcase in this commit, because that is a separate repo. The implementation has been tested, nevertheless.
julian-seward1
added a commit
to julian-seward1/wasmtime
that referenced
this issue
Nov 3, 2020
…ons. This patch implements, for aarch64, the following wasm SIMD extensions. v128.load32_zero and v128.load64_zero instructions WebAssembly/simd#237 The changes are straightforward: * no new CLIF instructions. They are translated into an existing CLIF scalar load followed by a CLIF `scalar_to_vector`. * the comment/specification for CLIF `scalar_to_vector` has been changed to match the actual intended semantics, per consulation with Andrew Brown. * translation from `scalar_to_vector` to the obvious aarch64 insns. * special-case zero in `lower_constant_f128` in order to avoid a potentially slow call to `Inst::load_fp_constant128`. * Once "Allow loads to merge into other operations during instruction selection in MachInst backends" (bytecodealliance#2340) lands, we can use that functionality to pattern match the two-CLIF pair and emit a single AArch64 instruction. There is no testcase in this commit, because that is a separate repo. The implementation has been tested, nevertheless.
julian-seward1
added a commit
to julian-seward1/wasmtime
that referenced
this issue
Nov 4, 2020
…ons. This patch implements, for aarch64, the following wasm SIMD extensions. v128.load32_zero and v128.load64_zero instructions WebAssembly/simd#237 The changes are straightforward: * no new CLIF instructions. They are translated into an existing CLIF scalar load followed by a CLIF `scalar_to_vector`. * the comment/specification for CLIF `scalar_to_vector` has been changed to match the actual intended semantics, per consulation with Andrew Brown. * translation from `scalar_to_vector` to aarch64 `fmov` instruction. This has been generalised slightly so as to allow both 32- and 64-bit transfers. * special-case zero in `lower_constant_f128` in order to avoid a potentially slow call to `Inst::load_fp_constant128`. * Once "Allow loads to merge into other operations during instruction selection in MachInst backends" (bytecodealliance#2340) lands, we can use that functionality to pattern match the two-CLIF pair and emit a single AArch64 instruction. * A simple filetest has been added. There is no comprehensive testcase in this commit, because that is a separate repo. The implementation has been tested, nevertheless.
julian-seward1
added a commit
that referenced
this issue
Nov 4, 2020
…ons. This patch implements, for aarch64, the following wasm SIMD extensions. v128.load32_zero and v128.load64_zero instructions WebAssembly/simd#237 The changes are straightforward: * no new CLIF instructions. They are translated into an existing CLIF scalar load followed by a CLIF `scalar_to_vector`. * the comment/specification for CLIF `scalar_to_vector` has been changed to match the actual intended semantics, per consulation with Andrew Brown. * translation from `scalar_to_vector` to aarch64 `fmov` instruction. This has been generalised slightly so as to allow both 32- and 64-bit transfers. * special-case zero in `lower_constant_f128` in order to avoid a potentially slow call to `Inst::load_fp_constant128`. * Once "Allow loads to merge into other operations during instruction selection in MachInst backends" (#2340) lands, we can use that functionality to pattern match the two-CLIF pair and emit a single AArch64 instruction. * A simple filetest has been added. There is no comprehensive testcase in this commit, because that is a separate repo. The implementation has been tested, nevertheless.
cfallin
pushed a commit
that referenced
this issue
Nov 30, 2020
…ons. This patch implements, for aarch64, the following wasm SIMD extensions. v128.load32_zero and v128.load64_zero instructions WebAssembly/simd#237 The changes are straightforward: * no new CLIF instructions. They are translated into an existing CLIF scalar load followed by a CLIF `scalar_to_vector`. * the comment/specification for CLIF `scalar_to_vector` has been changed to match the actual intended semantics, per consulation with Andrew Brown. * translation from `scalar_to_vector` to aarch64 `fmov` instruction. This has been generalised slightly so as to allow both 32- and 64-bit transfers. * special-case zero in `lower_constant_f128` in order to avoid a potentially slow call to `Inst::load_fp_constant128`. * Once "Allow loads to merge into other operations during instruction selection in MachInst backends" (#2340) lands, we can use that functionality to pattern match the two-CLIF pair and emit a single AArch64 instruction. * A simple filetest has been added. There is no comprehensive testcase in this commit, because that is a separate repo. The implementation has been tested, nevertheless.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
cranelift:area:machinst
Issues related to instruction selection and the new MachInst backend.
In
MachInst
-based backends, instruction selection works by pattern-matching on a root opcode and potentially the opcodes that produce its inputs (the "operand tree"). Sometimes, a particular machine backend is able to pattern-match several opcodes in a certain arrangement and generate one machine instruction that covers them all (e.g.: multiply-add).Some machine architectures have the ability to load one operand of an operation from memory. For example, on x86, we can do
add rax, [rbx]
, which is a combination load and add (in that order).Unfortunately, because of a (sound but conservative) design in our pattern-matching system, we cannot match patterns where the load is an operand rather than the root, because we do not allow operations that have side-effects or touch memory to move in the program. This is because reasoning about when it is safe to do so is complex. (Instruction coloring is the framework that does this reasoning, but because the color changes after the load or side-effecting op, its own color differs from that of every program point below it, so it will never merge.)
Consider the following test-case as an interesting example:
We want this to lower to two add-from-memory instructions on x86. Working bottom-up (as VCode lowering does), we first see
v5
and perhaps reason that we can sink the loadv3
because it does not move across any other memory ops. We would also like to do this forv2
intov4
, and in this example we can, but only after we sinkv3
; the load producingv2
moves across wherev3
used to be. Hence, a naïve coloring scheme won't necessarily work.A solution that @julian-seward1 and I have converged on involves numbering only side-effecting ops with colors (or sequence numbers), and while scanning backward through a basic block, tracking the "current sequence number", or in other words, the next number whose op we expect to see. We permit a merge of any load with this sequence number if (i) it's in the same BB and (ii) this use is its only use; then we decrement the sequence number.
This implements the very simple rule that a load or store never moves over another load or store; but as we sink one, we create space for the one above it to sink, as well.
(Later we might want to extend this with multiple sequence numbers, one per disjoint unit of state, sort of in the spirit of a vector clock; but details there are TBD.)
The text was updated successfully, but these errors were encountered: