-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multicore support for RISC-V #11418
Multicore support for RISC-V #11418
Conversation
asmcomp/riscv/emit.mlp
Outdated
let lbl_call_gc = new_label () in | ||
let n = -bytes in | ||
let offset = Domainstate.(idx_of_field Domain_young_limit) * 8 in | ||
if is_immediate n then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use emit_addimm
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks. Fixed.
Sure, no worries, there's no hurry :) |
Friendly ping. |
Hi @nojb, this is next on my list. |
I've started reviewing the PR (finally). I have managed to build the branch on I shall go over the memory model compilation first. [Update] The tests have now run to completion and the only failures do not seem to be connected to this PR but to my execution environment:
|
There is some discrepancy between compilation of the memory model mentioned in the original PR message and the implementation. The compilation mentioned in the original PR message is not completely right, but the implementation seems to do it more correctly, but not fully. Non-atomic loads ✅The compilation of non-atomic loads is just a plain load, and this looks correct. Non-atomic stores ❌The description above says that non-atomic stores are compiled to plain stores ( Firstly, we need to discriminate between the following:
For reference, this classification is similar to the one made for Arm64: #10972 (comment) (1) ✅ should be compiled to plan stores (2) ❌ needs to ensure that prior loads are ordered before the store. This is the reason for The current code sequence
at https://github.com/ocaml/ocaml/pull/11418/files#diff-c70b7c91c9069b1c55affc92395ef1abfe6f21442c81f845b65ac19a3141fae5R371 is too strong. Table A.4 from the RISC-V spec provides the mapping from Arm to RISC-V (reproduced below). According to this, the instruction sequence for (2) would be
In fact,
(3) ✅ is taken care of by (4) ✅ these operations do not follow the memory model and nothing to be done here. References
|
Thanks for the clear explanation @kayceesrk! I agree with it and have amended in consequence. I guess loads also need an adjustment along similar lines? |
The atomic load also can be simplified a bit. Atomic loadCurrently, the atomic load is
If you look at what
it is
See https://godbolt.org/z/cEjf95WzM. The fence
Atomic storeNothing to be done here as it goes through the C stub |
@@ -243,10 +253,9 @@ let emit_instr env i = | |||
assert (env.f.fun_prologue_required); | |||
let n = frame_size env in | |||
emit_stack_adjustment (-n); | |||
if n > 0 then cfi_adjust_cfa_offset n; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for removing this line? It seems to do the right thing, even if CFI directives aren't fully right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. The line has been moved inside emit_stack_adjustment
(as it was missing from other places where this function was being used to adjust the stack pointer).
| Lop(Iload { memory_chunk = Word_int | Word_val; addressing_mode = Iindexed ofs; is_atomic } ) -> | ||
if is_atomic then | ||
` fence rw, rw\n`; | ||
` ld {emit_reg i.res.(0)}, {emit_int ofs}({emit_reg i.arg.(0)})\n`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean to use a TAB instead of SPACE between fence
and rw
? Fixed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was pointing out that the backticks are not left-aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the existing backend code, it seems that the backticks are aligned as normal OCaml code, so indented under an if
. But I agree it looks a bit odd. Instead I put it on the same line as the if
, similar to
Line 816 in 31ba6ed
if assignment then ` dmb ishld\n`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I was reading the code wrong 😑 . The fix looks good.
@@ -38,8 +38,7 @@ let word_addressed = false | |||
s2-s9 8-15 arguments/results (preserved by C) | |||
t2-t6 16-20 temporary | |||
s0 21 general purpose (preserved by C) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't s0 for the OCaml stack now and not fully general purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Anil, as far as I remember, we use s0
in only one spot as an auxiliary register to preserve the OCaml stack pointer (here: https://github.com/nojb/ocaml/blob/71b9ebae167c5574b25d18375df5c45715a0d787/asmcomp/riscv/emit.mlp#L326-L335) and to do that we mark s0
as being killed by noalloc
calls (here: https://github.com/nojb/ocaml/blob/71b9ebae167c5574b25d18375df5c45715a0d787/asmcomp/riscv/proc.ml#L244-L249). But as far as the register allocator is concerned I think that s0
is still treated as a general purpose.
Perhaps some trick could be used to avoid the need to reserve s0
in this way, but I couldn't figure it out so far...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, quite right, thanks.
Array.of_list(List.map phys_reg | ||
[0; 1; 2; 3; 4; 5; 6; 7; 8; 16; 17; 18; 19; 20; 22; | ||
[0; 1; 2; 3; 4; 5; 6; 7; 16; 17; 18; 19; 20; 21 (* s0 *); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For posterity, could you tell us why 8
and 22
are gone and 21
added? A reply to this question here is fine. No need to add comments in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8
(=s2
) is removed: the comment intrunk
indicates that this register was included in this list because it is clobbered bycaml_c_call
. However this seems like a mistake, because this list specifies those registers killed bynoalloc
calls (basically caller-save ones), which do not go throughcaml_c_call
at all. So it looks like8
did not need to be here at all to begin with.21
(=s0
) is added: this is explained in the discussion [above].(Multicore support for RISC-V #11418 (comment))22
(=t0
) is removed: this one is now used in the runtime stubs as a general auxiliary register, so it is removed from the set of allocatable registers altogether (realized by loweringnum_available_registers.(0)
from 23 to 22).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks good to me.
Thanks! I am updating Changes, and am wondering if I should move it to the 5.1 section :) @Octachron? |
This is just fixing a bug in 5.0.0~beta1 isn't it? :-) |
Yes, a new port should be moved to the next version at this point of the release. |
OK! Moved to "Working version" section. Planning to merge once CI passes. Thanks @kayceesrk for the review! |
Lines 34 to 47 in 762014f
No RISCV definition has been added. Is it because the fallback is adequate on RISCV, or because we forgot to extend this with a RISCV version? (I wish it was easier to tell. Idle thoughts:
|
We forgot to extend this with a RISC-V version.
Indeed, the PAUSE instruction seems to be the corresponding thing in RISC-V land, see p. 39 https://github.com/riscv/riscv-isa-manual/releases/download/draft-20221004-28b46de/riscv-spec.pdf if you prefer PDF. |
Agree with extending |
This PR adds Multicore support to the RISC-V backend, closely following what was done for Arm64. The memory model is implemented by the following sequences:
fence iorw, iorw; ld; fence iorw, iorw
fence iorw, ow; amoswap.d.aq
ld
sd
The first two sequences are what
gcc
emits foratomic_load
andatomic_store
. Perhaps this is too naïve, the experts will correct me :) For reference: the RISC-V memory model is called RVWMO (RISC-V Weak Memory Ordering), and an overview of it can be found in the official spec, Chapter 17.Beyond that, the PR consists in closely porting over the Arm64 changes to RISC-V. Some remarks (in no particular order):
amoswap.d.aq
instruction requires trivial addressing in the memory operand, this is enforced inasmcomp/riscv/selection.ml
.Pop_frame_pointer
macro inruntime/caml/stack.h
in the same way as in Arm64 so that the same stack walking logic can go through.t0
is reserved for the runtime/code generator; perhaps it could have been possible to do some assembly gymnastics to keept0
available for the register allocator, but it would have complicated matters and it would have increased the diff between the RISC-V and Arm64 backends.s0
is used for this purpose (and so is marked as "destroyed" for nonalloc calls, inProc.destroyed_at_c_noalloc_call
).caml_allocN
) was added.Assigning to @kayceesrk who expressed interest in looking at this PR, but as usual any and all feedback is warmly welcome.