-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Cranelift: Remove ArgumentPurpose::StructReturn
#4618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cranelift: Remove ArgumentPurpose::StructReturn
#4618
Conversation
Subscribe to Label Action
This issue or pull request has been labeled: "cranelift", "cranelift:area:machinst", "cranelift:area:x64", "isle"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good speedup for sure!
I'm not terribly fond of the externalization of the legalize step here -- it seems error-prone to require this at each callsite. Is there a way to avoid the clone in another way, perhaps by returning Cow
from ensure_struct_return_ptr_is_returned
and only cloning the sig if it's actually mutated?
Returning a |
That said, I am investigating moving this legalization out of the mach inst code and into the actual legalizer pass we have, which should just modify the signatures in place, instead of doing any |
Alternately: give |
f099952
to
a180a41
Compare
I've rebased this on top of #4621 and rewritten it to do this signature legalization in-place in the legalization pass. This gives a nice phase separation and puts responsibility for legalization in the expected place. It is also an even larger speed up (against the remove-unused-fields PR) than the original PR was (against main):
However, I starting looking into how Or if removing cc @bjorn3 |
2baf5ba
to
db76f7b
Compare
Updating this PR to be about removing We used to legalize multi-value returns to using struct-return pointers where
This isn't really useful for Cranelift embedders anymore since it doesn't even Finally, here are the Sightglass benchmark wins we get from this removal:
|
ABISig::from_func_sig
should take an already-canonicalized signatureArgumentPurpose::StructReturn
The sysv x86_64 abi handles large struct returning by passing a pointer in a specific register, writing the return value to this buffer and returning the pointer again in a different register. This can't as easily be done if StructReturn is removed. Currently simply declaring an argument as StructReturn is enough, but with that removed you did have to pass it as first argument and make sure to return it again as first return value. I expect most people to forget that last step. In addition architectures other than x86_64 may have different requirements (possibly not even expressible without StructReturn), thus making Cranelift IR less target independent and pushing complexity for choosing the right way towards Cranelift users. |
We used to legalize multi-value returns to using struct-return pointers where callees would store result values into the given struct-return buffer and callers would provide the struct-return buffer when calling and then load the results out of it. We haven't done that for a while and just rely on the calling convention's normal method of returning multiple values now. The only special casing that `ArgumentPurpose::StructReturn` has now is 1. We legalize signatures that have a `StructReturn` parameter but no `StructReturn` result to add the missing `StructReturn` result 2. We automatically insert a copy from a function's `StructReturn` argument to its `StructReturn` result value This isn't really useful for Cranelift embedders anymore since it doesn't even handle putting the return values into the struct-return buffer or getting them out again, has maintenance and cruft overhead for Cranelift hackers, and the above signature legalization in (1) also imposes performance overhead on all Cranelift compiles regardless of whether they use struct returns or not. It's time we removed the vestigial `ArgumentPurpose::StructReturn`. Finally, here are the Sightglass benchmark wins we get from this removal: ``` compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm Δ = 214956202.90 ± 31700992.96 (confidence = 99%) main.so is 0.91x to 0.94x faster than no-sret.so! no-sret.so is 1.07x to 1.09x faster than main.so! [2765571620 2866580329.79 3085702646] main.so [2396129997 2651624126.89 2923726602] no-sret.so compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 4176509.17 ± 2835408.05 (confidence = 99%) main.so is 0.95x to 0.99x faster than no-sret.so! no-sret.so is 1.01x to 1.05x faster than main.so! [115737735 133448206.82 149712338] main.so [108735836 129271697.65 166386156] no-sret.so compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm No difference in performance. [77356671 85735828.56 96331117] main.so [75824588 84176414.51 94308652] no-sret.so ```
db76f7b
to
53b8344
Compare
|
@bjorn3 I think that if Cranelift natively had struct types (as LLVM does), it would make more sense to worry about closely following the ABI for aggregate values. In such a hypothetical case, Cranelift would be responsible for everything related to the struct's handling, and the IR producer would simply create the struct and fill it in or consume it. However, CLIF is at a lower abstraction level: it provides building blocks, but does not actually have support for struct types. Sometimes these building blocks are "fundamental" and cannot be built from others. For example, args that need to be memcpy'd before a call are difficult to handle by generating CLIF instead. But I think that So then we ask why remove the building block: in this case, it is a small but not insignificant simplification in the ABI code, because it removes the only kind of "legalization" that we currently need to do to the signature. Having one signature that passes all the way through the pipeline, and then lowering ABI-handling code directly from that signature, is a nice simplicity + correctness win. And, performance as well, as @fitzgen shows here. So unless there is a reason it cannot be done at the CLIF level at all (please do speak up if this is actually the case and we've missed something), I think it makes sense to follow through with this. Does that make some sense at least? |
LLVM has struct types lowered to an sret argument at IR level too.
It has the semantics of pass this argument in the right register and return it in the right register again on x86_64. This is not the case on all architectures: On x86 there seems to be a difference between what works on x86_64 and what is actually done in C: https://godbolt.org/z/6dnEhK1vK Arm64 windows uses x8 instead of x0 as implicit return pointer: https://godbolt.org/z/fe95Y8Pdv Power has an extra In other words removing sret will make it impossible to correctly support those architectures in the future.
I think instead of implementing it as legalization, it should be a direct part of the abi handling code for the respective architecture. Especially as it differs between architectures. |
I have been debugging an abi incompatibility on aarch64 found in https://github.com/bjorn3/rustc_codegen_cranelift/pull/1255. Turns out sret arguments must be passed in the x8 register, while Cranelift currently passes it in x0 like a regular argument. It is not possible to pass a regular argument in x8, you need sret for this. Edit: Opened #4634 |
@bjorn3 thanks for the additional details (and for hunting the linked bug!); I agree that given this new context, |
I had a branch lying around changing the return type of |
@jameysharp that could be a good change; I'd be curious about the performance. @fitzgen has some plans regarding interning/sharing |
I think it makes sense to remove ensure_struct_return_ptr_is_returned and instead directly handle sret in the abi impls. That is more flexible and should allow avoiding this allocation unconditionally. |
I agree that doing the legalization on-the-fly is probably best. I investigated making it Working on refactoring all of signature stuff right now. |
My trick was I didn't change any types except the return type of Here's my cow-sig branch in case anyone wants to look at it. I'm not currently proposing merging it since it sounds like we should do something different. So I haven't re-evaluated whether it has any performance impact. |
We used to legalize multi-value returns to using struct-return pointers where
callees would store result values into the given struct-return buffer and
callers would provide the struct-return buffer when calling and then load the
results out of it. We haven't done that for a while and just rely on the calling
convention's normal method of returning multiple values now. The only special
casing that
ArgumentPurpose::StructReturn
has now isWe legalize signatures that have a
StructReturn
parameter but noStructReturn
result to add the missingStructReturn
resultWe automatically insert a copy from a function's
StructReturn
argument toits
StructReturn
result valueThis isn't really useful for Cranelift embedders anymore since it doesn't even
handle putting the return values into the struct-return buffer or getting them
out again, has maintenance and cruft overhead for Cranelift hackers, and the
above signature legalization in (1) also imposes performance overhead on all
Cranelift compiles regardless of whether they use struct returns or not. It's
time we removed the vestigial
ArgumentPurpose::StructReturn
.Finally, here are the Sightglass benchmark wins we get from this removal: