Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cranelift: Implement scalar fma on x86 #4460

Merged
merged 1 commit into from
Aug 3, 2022

Conversation

afonso360
Copy link
Contributor

@afonso360 afonso360 commented Jul 16, 2022

👋 Hey,

This PR Implements fma for scalar values.

x86 does not have dedicated instructions for scalar fma, so we lower to a libcall which matches llvm's behaviour.

I tried to implement it for SIMD as well but got a bit lost, ill try again later.

We can't enable fma.clif runtests until we merge #4453.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen cranelift:module isle Related to the ISLE domain-specific language labels Jul 16, 2022
@github-actions
Copy link

Subscribe to Label Action

cc @cfallin, @fitzgen

This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst", "cranelift:area:x64", "cranelift:module", "isle"

Thus the following users have been cc'd because of the following labels:

  • cfallin: isle
  • fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@jameysharp
Copy link
Contributor

I keep trying to review this but finding too many things I don't understand about Cranelift yet. So I'll ask some questions instead. I think there are three main things happening in this PR:

  1. You've moved make_libcall_sig from isa::x64::lower to ir::libcall and renamed it to Libcall::signature. This seems related to a nearby comment in the existing code: "TODO avoid recreating signatures for every single Libcall function." But in doing so you lose access to the ir::Inst that the libcall came from, so you have to list the input and output types for every libcall in the signature method.

  2. You've plumbed a target_lexicon::Triple through a bunch of functions. I think this is because libcall_3_ret_1 needs it to call emit_vm_call but only has access to the x64 IsleContext, so you have to thread the triple through to lower_common in order to stash it there.

  3. You're exposing x64's emit_vm_call to ISLE via a new libcall_3_ret_1 helper, and using that to write lowering rules for fma.

Does that cover everything you did in this PR?

I would find this easier to review without change (1), or if you pass the ir::Inst to Libcall::signature so its implementation is identical to make_libcall_sig. But I'm guessing the way you have it now is better because it means this function can be reused by other target backends, right?

I'm curious if the target triple is reachable through one of the other fields that's already in the IsleContext, or if there's a way to get the right libcall calling convention in emit_vm_call without using the triple. I'm guessing no, or you probably would have used it, but it may be worth double-checking. If it's possible to get rid of the tedious changes for (2), that would make this patch a lot more readable.

Step (3) seems like it's simple enough that it can't be wrong, but I haven't read up on the details of ISLE yet.

The other thing I'd like to know is whether anybody is using the fma CLIF instruction. It looks to me like wasmtime does not use it, so I assume it's here for use by one of the other frontends, but it would be nice to verify that.

I think @cfallin or @fitzgen will need to review this but I hope my notes can at least help guide that review.

@afonso360
Copy link
Contributor Author

I keep trying to review this but finding too many things I don't understand about Cranelift yet.

Thanks for reviewing either way!

You've moved make_libcall_sig from isa::x64::lower to ir::libcall and renamed it to Libcall::signature. This seems related to a nearby comment in the existing code: "TODO avoid recreating signatures for every single Libcall function." But in doing so you lose access to the ir::Inst that the libcall came from, so you have to list the input and output types for every libcall in the signature method.

I think that comment still applies, we are still creating a new signature every time we lower fma even if we could cache (or precompute) that, since the signature isn't going to change.

Losing the Inst is a bit more deliberate, the previous code was essentially just copying the Inst signature into a LibCall signature, which doesen't seem very correct to me. I've centralized the signatures in LibCall since they should be pretty much equal across arches, but this may be wrong as well.

I think using Inst can also lead to a situation where we put a libcall in a rule, and use it in a different instruction lowering and then copy the wrong signature. An example would be calling nearest in the div instruction lowering and use the signature from div when emitting the libcall for nearest

I'm curious if the target triple is reachable through one of the other fields that's already in the IsleContext, or if there's a way to get the right libcall calling convention in emit_vm_call without using the triple. I'm guessing no, or you probably would have used it, but it may be worth double-checking. If it's possible to get rid of the tedious changes for (2), that would make this patch a lot more readable.

Yeah, I couldn't find it, but would appreciate a double check on that.

Does that cover everything you did in this PR?

Yeah pretty much.

The other thing I'd like to know is whether anybody is using the fma CLIF instruction. It looks to me like wasmtime does not use it, so I assume it's here for use by one of the other frontends, but it would be nice to verify that.

I don't think so, since it is not implemented right now, but at least rustc_codegen_cranelift side steps the issue by calling the libcall directly both for SIMD and the scalar version of this instruction.

Thanks!

@afonso360
Copy link
Contributor Author

@cfallin Would it be possible to review this? The Fma op isn't that important (we have a better implementation on #4539), but the libcall mechanism is something that we need to lower other i128 ops and this is sort of blocking that.

Changes since @jameysharp 's review:

  • Renamed libcall_3_ret_1 to libcall_3
  • Enabled FMA on fuzzer

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good to me, thanks! I think it will need a rebase after #4571 goes in but it should be minor.

I don't see an actual libcall implementation here -- I guess it will result in a panic if used today in e.g. the Wasmtime embedding (but that's fine since the cranelift-wasm frontend won't use the opcode)? Or is it already implemented in cranelift-jit?

| LibCall::ElfTlsGetAddr => unimplemented!(),
}

if call_conv.extends_baldrdash() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will become a compile error once my remove-Baldrdash PR merges (it's in-queue now), sorry!

Copy link
Contributor Author

@afonso360 afonso360 Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait for it to be merged and rebase (it looks like it has a CI error).

@afonso360
Copy link
Contributor Author

afonso360 commented Aug 2, 2022

I don't see an actual libcall implementation here -- I guess it will result in a panic if used today in e.g. the Wasmtime embedding (but that's fine since the cranelift-wasm frontend won't use the opcode)? Or is it already implemented in cranelift-jit?

Sorry, I don't quite understand understand.

We implement a libcall lowering for fma. We already have libcall's in the x86 backend, but not in isle yet (see ceil lowering without SSE42).

So this is already reachable from wasmtime (from the ceil instruction). cranelift-jit already supports handling relocations for these functions, so we should also be good there. And I assume that those instructions already work in wasmtime (without SSE42).

And #4453 allows us to test them with runtests. This does cause a Illegal Instruction if no relocations are performed which is the state of the current runtest suite (and why we don't enable the fma tests in this PR).

@cfallin
Copy link
Member

cfallin commented Aug 2, 2022

I guess I meant specifically for fma -- this PR adds two new libcalls, FmaF32 and FmaF64, and I didn't see implementations of them. I was wondering if these implementations were coming later, or in general what the plan for them is? (Or perhaps I've missed them somewhere, but grepping for FmaF32 in the existing tree isn't showing anything.)

@afonso360
Copy link
Contributor Author

afonso360 commented Aug 2, 2022

We don't implement them. In cranelift-jit we expect them to be provided when linking. We translate the FmaF{32,64} enum to the libcall name (fmaf and fma) in cranelift-module and from there the linker finds those functions for us. We search libc / msvcrt / whatever native runtime by default in cranelift-jit.

I don't know how this works in Wasmtime, if we always need to provide those functions or if we also get them from the system.

@cfallin
Copy link
Member

cfallin commented Aug 2, 2022

OK, that seems fine to me; just want to confirm that I wasn't missing something! It is at least a loud failure if someone tries to use the lowering in their own embedding and finds that the libcall isn't provided/implemented, so this shouldn't be a problem.

x86 does not have dedicated instructions for scalar FMA, lower
to a libcall which seems to be what llvm does.
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@cfallin cfallin merged commit 709716b into bytecodealliance:main Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift:module cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants