Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranelift: Simplify leaf functions that do not use the stack #2960

Merged
merged 3 commits into from
Aug 27, 2021

Conversation

akirilov-arm
Copy link
Contributor

Leaf functions that do not use the stack (e.g. do not clobber any callee-saved registers) do not need a frame record; this has been discussed in issue #1148. I am not familiar with the ABIs of other architectures, so I don't know if it is safe to apply the same optimization, and that's why only the AArch64 backend does it.

@cfallin I'd appreciate any feedback on how these changes interact with unwinding; in particular, do we need an Inst::Unwind pseudo-instruction for the simple leaf functions we are optimizing?

cc @abrown @bnjbvr @uweigand

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen labels Jun 2, 2021
@uweigand
Copy link
Member

uweigand commented Jun 2, 2021

@akirilov-arm I haven't looked into your patch in detail yet, but the s390x back-end already doesn't allocate any stack frame if it is not needed. The gen_prologue_frame_setup routine always does nothing because I'm handling all frame setup in gen_clobber_save, and there I detect the case by simply noting that the required stack size is zero - computed as

        let stack_size =
            outgoing_args_size as i32 + clobber_size as i32 + fixed_frame_storage_size as i32;

(outgoing_args_size will be zero in a leaf function, clobber_size is zero if no call-saved register needs to be saved, and fixed_frame_storage_size is zero if there's no local stack variables or spill slots.)

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for tackling this! It's been on the "nice-to-have" todo-list for a long time; I'm happy to see it finally taken care of.

I think that we probably want to address this on aarch64 and x64 at the same time; it seems that almost all the pieces are there, since the ABI implementation is largely shared, unless I am missing something. (If it turns out to be significantly more work, then of course we can get to it later!)

The only uncertainty I have regarding this optimization is how stack unwinding / backtraces are maintained; I see this was discussed some already in #1148. I think that if we have no unwind instructions, the default CFA definition is sufficient since we never adjust SP, at least on SysV platforms. The same should be true for Windows, I think. The metadata that allows a backtrace to map PC to the current (leaf) function should still be present. But we should verify this: could you take one of the tests that depends on unwinding and stackmaps, such as the GC smoketest, and check that it is testing both the no-frame and with-frame cases?

Thanks again for this!

cranelift/codegen/src/isa/s390x/abi.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/x64/abi.rs Show resolved Hide resolved
@akirilov-arm
Copy link
Contributor Author

akirilov-arm commented Aug 23, 2021

I have enabled a couple of additional tests and made some minor changes, but I haven't checked how much work the x64 backend is going to require, which I am going to do next, so, please, do not merge yet.

@cfallin It turns out that there are already tests that use the unwinding information in a suitable way - in particular, traps::test_trap_trace mixes both functions to which the optimization applies and to which it doesn't in the call stack.

As for whether the defaults are sufficient in the AArch64 case - I checked the code that generated the DWARF Common Information Entry (CIE), and it sets both the Canonical Frame Address (CFA) and the return address correctly (among other things), so an empty Frame Description Entry (FDE) should be fine. In fact, I compiled a simple C function and looked at the unwinding information with readelf -wf - it was identical. Concerning Windows, I believe AArch64 is currently unsupported by Cranelift on that platform anyway.

@akirilov-arm akirilov-arm force-pushed the leaf_functions branch 3 times, most recently from 7a9345e to 50b11fe Compare August 24, 2021 10:37
@akirilov-arm akirilov-arm changed the title Cranelift AArch64: Simplify leaf functions that do not use the stack Cranelift: Simplify leaf functions that do not use the stack Aug 24, 2021
@akirilov-arm
Copy link
Contributor Author

@cfallin It looks like doing this optimization in the x64 backend fails on macOS. I am unable to work on this, so would it be acceptable to submit just the AArch64 part?


if flags.unwind_info() {
if flags.unwind_info() && setup_frame {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say I have looked into this much but isn't there some way for the unwind info to still unwind leaf functions? Looking at the test where MacOS fails, I think this problem will also exist for aarch64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, doesn't the comment next to the code imply that unwinding on Apple silicon is broken irrespective of my changes, so any potential breakages introduced by my patch on that platform shouldn't be a blocker to progressing with the PR (and we don't test that configuration in CI anyway)?

On the other hand, AFAIK there are no known issues with unwinding on x86-64 macOS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unwinding on Apple silicon is broken

Right now, unwinding on M1 doesn't panic/crash, it is just materializing an incomplete backtrace, and I would be interested in at least keeping it not crashing. So I'd be happy to try your patch on Apple Silicon, fwiw! (Also it makes sense to me to get this optimization in for aarch64 only at start, and then trying to untangle what's going on on x64.)

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akirilov-arm this looks good, and it's fine to pass on x64 for now -- the infrastructure is now in place, in any case, so it should be much easier after all the work you've done here. Thanks!

As an aside, IMHO we should do something about the expected outputs in compile tests having to change with every cross-cutting change (like frame setup or regalloc or ...); but that's a thought for another issue :-) Sorry about the tedium of updating them here.

I do think we need to resolve the macOS unwinding question -- I'll leave it to @bnjbvr to test and give another r+ on this if it looks good (thanks!).

@akirilov-arm
Copy link
Contributor Author

@cfallin Thanks for the review! I just reverted the x64 changes, but left the commit in the history in case anyone else would like to use it as a starting point in the future.

@bnjbvr
Copy link
Member

bnjbvr commented Aug 26, 2021

Will check first thing tomorrow!

Leaf functions that do not use the stack (e.g. do not clobber any
callee-saved registers) do not need a frame record.

Copyright (c) 2021, Arm Limited.
…ack"

This reverts commit a531d78.

Copyright (c) 2021, Arm Limited.
@bnjbvr
Copy link
Member

bnjbvr commented Aug 27, 2021

Unfortunately can't check because of #3256 (wasmtime testing on aarch64-darwin is busted before your PR), so let's not block it on that, and merge this one; we can get back to it later. Thanks :)

@bnjbvr bnjbvr merged commit 7b98be1 into bytecodealliance:main Aug 27, 2021
@akirilov-arm akirilov-arm deleted the leaf_functions branch August 27, 2021 12:31
abrown added a commit to abrown/wasmtime that referenced this pull request Nov 30, 2022
Cranelift has had the ability for some time to identify leaf functions;
by Cranelift's definition, a leaf function is one that knows of no other
call signatures. bytecodealliance#1148 noted how it would be a good idea to avoid extra
frame setup work in leaf functions and bytecodealliance#2960 implemented this for
aarch64 and s390x. This improvement was not made for x64 due to some
test failures. This change avoids any frame setup for non-stack-using
leaf functions in x64.
abrown added a commit to abrown/wasmtime that referenced this pull request Nov 30, 2022
Cranelift has had the ability for some time to identify leaf functions;
by Cranelift's definition, a leaf function is one that knows of no other
call signatures. bytecodealliance#1148 noted how it would be a good idea to avoid extra
frame setup work in leaf functions and bytecodealliance#2960 implemented this for
aarch64 and s390x. This improvement was not made for x64 due to some
test failures. This change avoids any frame setup for non-stack-using
leaf functions in x64.
abrown added a commit to abrown/wasmtime that referenced this pull request Dec 1, 2022
Cranelift has had the ability for some time to identify leaf functions;
by Cranelift's definition, a leaf function is one that knows of no other
call signatures. bytecodealliance#1148 noted how it would be a good idea to avoid extra
frame setup work in leaf functions and bytecodealliance#2960 implemented this for
aarch64 and s390x. This improvement was not made for x64 due to some
test failures. This change avoids any frame setup for non-stack-using
leaf functions in x64.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants