-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
riscv64: Add vconst
lowerings
#6324
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions and suggestions, but if you feel like merging this now I think that would be fine.
sink.use_label_at_offset(sink.cur_offset(), label, LabelUse::PCRelHi20); | ||
let inst = Inst::Auipc { | ||
rd, | ||
imm: Imm20::from_bits(0), | ||
}; | ||
inst.emit(&[], sink, emit_info, state); | ||
|
||
// Emit an add to the address with a relocation. | ||
// This later gets patched up with the correct offset. | ||
sink.use_label_at_offset(sink.cur_offset(), label, LabelUse::PCRelLo12I); | ||
Inst::AluRRImm12 { | ||
alu_op: AluOPRRI::Addi, | ||
rd, | ||
rs: rd.to_reg(), | ||
imm12: Imm12::zero(), | ||
} | ||
.emit(&[], sink, emit_info, state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct if the constant ends up being roughly 2048 bytes away, modulo 4096? I'm wondering if the high 20 bits might be off by one sometimes since the two instructions are extracting their 20/12 bits from offsets which differ by 4 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment reminded me that we discussed something almost exactly like this a while ago. And I found it:
#5951 (comment)
The Call
Relocation in the JIT performs exactly the same arithmetic (It's the same pair of relocations as this). And it is different from this one, we only offset the Lo12 by 4 not the Hi20, which makes sense.
I'm also going to try to reproduce that off-by-one with a test, I get really confused by this relocation math.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got this right, but I'm never confident when this sort of relocation math is involved.
We had 2 issues here. The first one you pointed out, only the Lo12 relocation should have the 4byte offset. The second one was the Hi20 relocation, that needs to get a 0x800
offset so that it skips to the next page as soon as the offset goes out of range for Lo12 since it is signed.
eb0cdee
to
fabce61
Compare
This was meant to exercise the changes in bytecodealliance#6324 but was failing in RISC-V due to some missing regalloc bits.
This was meant to exercise the changes in bytecodealliance#6324 but was failing in RISC-V due to some missing regalloc bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready to go, but I'm taking this week off and don't want to think quite hard enough to decide that. @elliottt, could you give this a quick pass and see if it makes sense to you too?
// We add 4 here since this relocation usually follows a PCRelHi20 relocation, at the previous | ||
// instruction. So we need to account for the 4 byte difference in offsets there. | ||
let lo12 = (offset + 4) as u32 & 0xFFF; | ||
let insn = (insn & 0xFFFFF) | (lo12 << 20); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels backwards to me, but I think it's probably correct.
The instruction using Hi20 comes first, followed by the Lo12. So if the Hi20 label is at address x
, then Lo12 is at address x+4
.
Based on that, if we want to compute the same address for both labels, I expected that either Hi20 should use x+4
, or Lo12 should use x-4
.
Of the two instructions, the one which actually examines the program counter is auipc
, so I think we need to compute the offset relative to that instruction. So I think you are correct that it's Lo12 that needs adjustment.
But I guess here we aren't given the address of the instruction, right? offset
is the distance from this instruction to the address we want, or in other words, target_address - insn_address
. So if insn_address
is x-4
, then offset
is target_address - (x - 4)
, which is equivalent to (target_address - x) + 4
.
And that's what you've implemented. If there's a way to make the comment more clear that'd be fantastic, but I'm at least reasonably convinced that this is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the additional comment, @afonso360!
; .byte 0x57, 0x70, 0x04, 0xcc | ||
; auipc t6, 0 | ||
; addi t6, t6, 0x14 | ||
; .byte 0x07, 0x85, 0x0f, 0x02 | ||
; ret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we're discovering some gaps in capstone :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, unfortunately capstone doesn't recognize any V extension instructions 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, just want to confirm that these offsets are ignored on purpose though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you @afonso360!
This was meant to exercise the changes in bytecodealliance#6324 but was failing in RISC-V due to some missing regalloc bits.
* riscv64: Use Vector Regclass * riscv64: Add assert to `Inst::Mov` It isn't ready yet * riscv64: Add SIMD vconst large test This was meant to exercise the changes in #6324 but was failing in RISC-V due to some missing regalloc bits. * riscv64: Restrict spill slot size * riscv64: Mark v0 as preferred * riscv64: Const compute clobbers
👋 Hey,
This PR does a number of things but the main goal is to enable
vconst
loading from the constant pool on the RISC-V backend.Store
/Emit
instructionsLoadAddr
when we can't directly support theAMode
encoding.rs
fileAMode
'sVecLoad
/VecStore
to avoid adding a 0 offset where possiblevconst
loweringThis is quite big, but it made sense to me to have it all together in a PR. Let me know if you'd like me to split this up a bit more.