Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riscv64: Add vconst lowerings #6324

Merged
merged 14 commits into from
May 9, 2023

Conversation

afonso360
Copy link
Contributor

@afonso360 afonso360 commented May 2, 2023

👋 Hey,

This PR does a number of things but the main goal is to enable vconst loading from the constant pool on the RISC-V backend.

  • Cleanups to Store/Emit instructions
    • We now fallback to LoadAddr when we can't directly support the AMode
    • Moved the instruction encoding to the encoding.rs file
  • Add Constant Pool and MachLabel based AMode's
  • Cleanup the existing VecLoad/VecStore to avoid adding a 0 offset where possible
  • Add the vconst lowering

This is quite big, but it made sense to me to have it all together in a PR. Let me know if you'd like me to split this up a bit more.

@afonso360 afonso360 requested a review from a team as a code owner May 2, 2023 09:57
@afonso360 afonso360 requested review from jameysharp and removed request for a team May 2, 2023 09:57
@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label May 2, 2023
Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few questions and suggestions, but if you feel like merging this now I think that would be fine.

cranelift/codegen/src/isa/riscv64/inst/encode.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/riscv64/inst/args.rs Outdated Show resolved Hide resolved
Comment on lines +1207 to +1219
sink.use_label_at_offset(sink.cur_offset(), label, LabelUse::PCRelHi20);
let inst = Inst::Auipc {
rd,
imm: Imm20::from_bits(0),
};
inst.emit(&[], sink, emit_info, state);

// Emit an add to the address with a relocation.
// This later gets patched up with the correct offset.
sink.use_label_at_offset(sink.cur_offset(), label, LabelUse::PCRelLo12I);
Inst::AluRRImm12 {
alu_op: AluOPRRI::Addi,
rd,
rs: rd.to_reg(),
imm12: Imm12::zero(),
}
.emit(&[], sink, emit_info, state);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct if the constant ends up being roughly 2048 bytes away, modulo 4096? I'm wondering if the high 20 bits might be off by one sometimes since the two instructions are extracting their 20/12 bits from offsets which differ by 4 bytes.

Copy link
Contributor Author

@afonso360 afonso360 May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment reminded me that we discussed something almost exactly like this a while ago. And I found it:
#5951 (comment)

The Call Relocation in the JIT performs exactly the same arithmetic (It's the same pair of relocations as this). And it is different from this one, we only offset the Lo12 by 4 not the Hi20, which makes sense.

I'm also going to try to reproduce that off-by-one with a test, I get really confused by this relocation math.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got this right, but I'm never confident when this sort of relocation math is involved.

We had 2 issues here. The first one you pointed out, only the Lo12 relocation should have the 4byte offset. The second one was the Hi20 relocation, that needs to get a 0x800 offset so that it skips to the next page as soon as the offset goes out of range for Lo12 since it is signed.

cranelift/codegen/src/isa/riscv64/inst/mod.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/riscv64/inst/args.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/riscv64/inst/emit.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/riscv64/inst/mod.rs Outdated Show resolved Hide resolved
@afonso360 afonso360 force-pushed the riscv-vec-vconst branch from eb0cdee to fabce61 Compare May 5, 2023 14:03
@afonso360
Copy link
Contributor Author

I rebased this on main, to get #6325 which had some cleanups that were useful for this PR (i.e. the unsigned_field_width function suggested above).

I've left the previous commits intact and only added from 1048da3 onwards.

afonso360 added a commit to afonso360/wasmtime that referenced this pull request May 5, 2023
This was meant to exercise the changes in bytecodealliance#6324 but was failing in RISC-V due to some missing regalloc bits.
afonso360 added a commit to afonso360/wasmtime that referenced this pull request May 5, 2023
This was meant to exercise the changes in bytecodealliance#6324 but was failing in RISC-V due to some missing regalloc bits.
Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready to go, but I'm taking this week off and don't want to think quite hard enough to decide that. @elliottt, could you give this a quick pass and see if it makes sense to you too?

// We add 4 here since this relocation usually follows a PCRelHi20 relocation, at the previous
// instruction. So we need to account for the 4 byte difference in offsets there.
let lo12 = (offset + 4) as u32 & 0xFFF;
let insn = (insn & 0xFFFFF) | (lo12 << 20);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels backwards to me, but I think it's probably correct.

The instruction using Hi20 comes first, followed by the Lo12. So if the Hi20 label is at address x, then Lo12 is at address x+4.

Based on that, if we want to compute the same address for both labels, I expected that either Hi20 should use x+4, or Lo12 should use x-4.

Of the two instructions, the one which actually examines the program counter is auipc, so I think we need to compute the offset relative to that instruction. So I think you are correct that it's Lo12 that needs adjustment.

But I guess here we aren't given the address of the instruction, right? offset is the distance from this instruction to the address we want, or in other words, target_address - insn_address. So if insn_address is x-4, then offset is target_address - (x - 4), which is equivalent to (target_address - x) + 4.

And that's what you've implemented. If there's a way to make the comment more clear that'd be fantastic, but I'm at least reasonably convinced that this is correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the additional comment, @afonso360!

Comment on lines +40 to +44
; .byte 0x57, 0x70, 0x04, 0xcc
; auipc t6, 0
; addi t6, t6, 0x14
; .byte 0x07, 0x85, 0x0f, 0x02
; ret
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we're discovering some gaps in capstone :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, unfortunately capstone doesn't recognize any V extension instructions 😞

Copy link
Member

@elliottt elliottt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, just want to confirm that these offsets are ignored on purpose though :)

Copy link
Member

@elliottt elliottt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you @afonso360!

@afonso360 afonso360 added this pull request to the merge queue May 9, 2023
Merged via the queue into bytecodealliance:main with commit b9e4474 May 9, 2023
@afonso360 afonso360 deleted the riscv-vec-vconst branch May 9, 2023 23:29
afonso360 added a commit to afonso360/wasmtime that referenced this pull request May 10, 2023
This was meant to exercise the changes in bytecodealliance#6324 but was failing in RISC-V due to some missing regalloc bits.
afonso360 added a commit that referenced this pull request May 16, 2023
* riscv64: Use Vector Regclass

* riscv64: Add assert to `Inst::Mov`

It isn't ready yet

* riscv64: Add SIMD vconst large test

This was meant to exercise the changes in #6324 but was failing in RISC-V due to some missing regalloc bits.

* riscv64: Restrict spill slot size

* riscv64: Mark v0 as preferred

* riscv64: Const compute clobbers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants