Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Transitioning to the new backend by default #1936

Closed
10 of 14 tasks
bnjbvr opened this issue Jun 29, 2020 · 14 comments
Closed
10 of 14 tasks

RFC: Transitioning to the new backend by default #1936

bnjbvr opened this issue Jun 29, 2020 · 14 comments
Labels
cranelift Issues related to the Cranelift code generator

Comments

@bnjbvr
Copy link
Member

bnjbvr commented Jun 29, 2020

TL;DR: what would it take to deprecate the “old” backend in favor of the “new” backend for code generation?

Context

The compilation pipeline in Cranelift currently does instruction selection (through legalizations) before optimizing the intermediate representation (IR), applying register allocation on it, and then generating the machine code. From the point of view of Cranelift, these last steps can be seen as a “backend” that generates machine code for different target architectures.

The previous backend was a bit complicated to work with: it was using the DSL from the codegen/meta crate, with concepts hard to approach and explain (like Recipes), it generated Rust code that could get out of sync with the non-meta crate or contain compile errors, etc. (see also #1141). A decision was made to work on a new backend (sometimes referred to as the “machinst” backend). This was presented in #1174 and has landed since then, as an alternative backend (viz., in addition to the existing one).

As of today, the old backend supports generating machine code for (some subset of) RISC-V, x86 64 bits and 32 bits. The new backend supports generating machine code for aarch64, and has a work-in-progress backend for x86_64. The duplication of x86_64 in both the old and new backends imply that both backends move ahead in parallel. This makes it harder for the new backend to catch up with the old one as new features are being added, and can generate frustration as different teams with different priorities work on different backends.

The Mozilla Spidermonkey team has enough confidence in the new backend, which we consider to be pleasant to work with (developer ergonomics), fast enough for our use case (both compile-time and generated code throughput), and it has the potential for more compile-time and code quality optimizations in the long run, so we think it is a good time to start this discussion.

The proposal

We propose that at some point in the future, we entirely move away from the “old” backend (that is, remove it, as well as all the associated code in the meta language), and use the “new” backend, for all target architectures, and that the only possible way to implement a new target is to do it through the new backend. Notably, since x86 is the main target in the old backend, this means removing the old x86 backend.

Of course, this can’t be done until all the primarily involved stakeholders are satisfied with this idea and don’t have any strong objections in moving forward. This RFC is a first step at identifying what the acceptance criterias would be to make it possible to transition, and what a plan would be to make this realistic.

What this is not about: removing the entire meta language. This may or may not be done in the future (if we want to do it, then moving over to the new backend is a first step).

Acceptance criteria

Note that these criteria are not definite and could evolve over time, based on our discussions here.

  • Features: the new backend should support all the features that are effectively used by all the stakeholders, including all the WebAssembly (wasm) features that have been implemented so far in the old backend.
    • Target-independent features:
      • support wasm MVP features + lightweight extensions (mutable globals, bulk memory ops)
      • support wasm reftypes
      • support wasm multi-value
      • debugging support for generated code.
      • implement enough of x86_64 to support these features.
    • Performance: since the new backend came with its own instruction selection and with a new register allocation, its performance characteristics are likely to be different from those of the old backend.
      • CLIR compile time: the new backend should compile code as fast as or faster than the old backend, for a set of wasm benchmarks (to be determined).
      • generated code quality: the new backend should generate code that runs at least as fast as or faster than the code generated by the old backend, for a set of wasm benchmarks (to be determined).
  • Security and quality:
    • CL testing: pass all the existing CLIF tests
    • Major stakeholders/embedders pass tests
    • fuzzing should run for some time and fuzz bugs should be fixed
  • Stakeholders supported:
    • Wasmtime testing: pass all the existing wasmtime tests using Cranelift as the compiler
    • SpiderMonkey testing: pass all the existing SpiderMonkey tests using Cranelift as the compiler
    • Other Bytecode Alliance stakeholders give their “go” (see below).

Feel free to comment about other things that are important to you, and please explain why (if it is not obvious)! Good criteria tend to be objectively quantifiable, measurable and/or bimodal (done or not done).

Potential additions to this list

These are additions to the above list, and need to be discussed as a group:

  • enough support to not break cg_clif, a Rust backend initiative using Cranelift for code generation. It is hard to make a guess about the amount of work that will be required to keep cg_clif working, while it is our hope that most of it should be covered by our work, and the rest could be a community-supported effort.
  • porting the x86 32-bits platform. While most of the code could be reused between x86_64 and x86 32-bits, it may not be a primary target right now, and we might or might not want to block the transition for this.

Proposed planning

Step 1: agree on the proposal

This is the current step that’s being done as part of this issue. See below.

Step 2: get to a point where we can try the new backend in real-world settings

Once we get to a point where we can compile code for large wasm programs mostly using wasm MVP features, we’ll be able to do a performance analysis, comparing on the two axis presented above. This will give us confidence in how fast we can move forward with this plan, or if we should revisit some implementation decisions, and chase more performance first.

Step 3: finish implementation of remaining features

This means implementing all the Features mentioned in the above list of criteria, as well as passing tests from all the test suites. At this point, we could put up an official depreciation notice for the old backend, and encourage people to use the new backend in general.

Step 4: do a final approval and switch

Based on an evaluation of performance, as well as feedback from the different stakeholders, we can eventually decide to enable the new backend by default. Removal of the code supporting the old backend may or may not happen at the same time; deferring its removal for a short period of time allows to switch the default back to the old backend, in case of unexpected consequences.

Future work

There is future work that is going to be enabled by switching to the new backend. At this point, these are mostly ideas, and it is not the point of this issue to discuss the design / feasibility / interest aspects of these ideas.

  • Code removal in the meta language as well as in the codegen crate may lower the overall build time of Cranelift, see also https://github.com/bytecodealliance/cranelift/issues/1318 which shows that large functions in the encodings/recipes system take some time to compile (and they generate large functions too).
  • After removal of the old backend, since the instruction selection really happens at the MachInst IR (Vcode) level, then all the CLIF instructions which were present for the sole benefit of being available in the backend can be removed. This includes CLIF instructions that are target-specific (e.g. x86_udivmodx), as well as instructions which offer alternative operand modes (e.g. iadd_imm is an alternate operand mode for iadd, allowing to express an “int add with immediate” with two different CL instructions, making pattern-matching more complex).
  • Translating from wasm to target-independent Vcode directly (and then adapting the lowering machinery to use this) is something we would like to investigate. In an even longer horizon, we could get back to having a single IR container again (parameterized by instruction/opcode-space) and carry over optimizations onto it, while avoiding some of the pitfalls of the current CLIF design (such as performance impact of in-place editing).

Thoughts?

If you have any comments, questions, alternative proposals, objections, please feel free to write them down here. Note that we’re looking for consensus here, which is gained not when everybody agrees about all the details, but when nobody has strong objections anymore. So please carefully discuss objections and assume good intent from everyone involved in the process :-) Thanks!

@bnjbvr bnjbvr added the cranelift Issues related to the Cranelift code generator label Jun 29, 2020
@github-actions
Copy link

Subscribe to Label Action

cc @bnjbvr

This issue or pull request has been labeled: "cranelift"

Thus the following users have been cc'd because of the following labels:

  • bnjbvr: cranelift

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@bjorn3
Copy link
Contributor

bjorn3 commented Jun 29, 2020

enough support to not break cg_clif, a Rust backend initiative using Cranelift for code generation. It is hard to make a guess about the amount of work that will be required to keep cg_clif working, while it is our hope that most of it should be covered by our work, and the rest could be a community-supported effort.

Must have

The main thing missing in all new style backends is 128bit integer support. It is very very hard to support 128bit integers in cg_clif without Cranelift support and it is impossible to even compile libcore without 128bit integer support in cg_clif.

edit: TLS support is also essential for multi-threading support.

Nice to have

Other than that having System-V struct argument support (#1559 for the old style x86 backend) would be really nice. That PR is not yet merged though, but supporting proc-macros depends on it to prevent an abi incompatibility between cg_clif (the proc macro) and cg_llvm (the rustc loading the proc-macro).

as well as instructions which offer alternative operand modes (e.g. iadd_imm is an alternate operand mode for iadd, allowing to express an “int add with immediate” with two different CL instructions, making pattern-matching more complex).

Those are nice convenience functions. Maybe they could just be that functions that emit a iconst and an iadd, etc?

@julian-seward1
Copy link
Contributor

One possibility for 128 bit math is to add an i128 type to CLIR, and let the individual target instruction selectors lower them down to the relevant sequences of machine instructions. I know this isn't the "traditional" way that CL currently uses. There's an argument to be made that having condition codes exposed in the CLIR -- as required by the old-backend handling for multiword arithmetic -- is more trouble than it's worth. Exposed condition codes mean we have to have extra logic to ensure we don't put any flag-setting insns in between generation and use; they are non-register-allocatable, and we have to jump through hoops to try and avoid materialising them unnecessarily in registers.

@bjorn3
Copy link
Contributor

bjorn3 commented Jun 29, 2020

One possibility for 128 bit math is to add an i128 type to CLIR, and let the individual target instruction selectors lower them down to the relevant sequences of machine instructions. I know this isn't the "traditional" way that CL currently uses.

There is already an i128 type. 128 bit math is currently handled is using legalizations though.

@bjorn3
Copy link
Contributor

bjorn3 commented Jun 29, 2020

I forgot to add TLS support to the must have section. Added it now.

@fitzgen
Copy link
Member

fitzgen commented Jun 29, 2020

Thanks for writing this up @bnjbvr! Very thorough.

One thing not mentioned explicitly, and which Wasmtime relies upon, is support for the target's native calling conventions, as opposed to SpiderMonkey's.

@pchickey
Copy link
Contributor

pchickey commented Jul 8, 2020

Sorry that I forgot to respond on this. The Lucet team can start running tests against the new backend once x86_64 support is sufficient for Wasm MVP support.

The lucet-wasi-fuzz crate uses csmith and clang to explore a subset of Wasm programs. This approach has found some bugs in Lucet before but I don't believe it has ever discovered Cranelift bugs. We can apply it to the new backend and see if anything shakes out.

@cfallin
Copy link
Member

cfallin commented Nov 12, 2020

I'll be actively working on this; for tracking purposes, this depends on at least #2079, #2372, #2272, u128 support, and a solid round of testing.

Since the issue was created, we've got reftypes support in the Cranelift backend itself, though not in the wasmtime embedding; I'll look at this soon. We've also got unwind info; @yurydelendik, are you aware of any other missing debug-related bits in the new backend?

@yurydelendik
Copy link
Contributor

@yurydelendik, are you aware of any other missing debug-related bits in the new backend?

To generate DWARF's .debug_info, you will also need ValueLabelsRanges data and implement TODOs for unwind info in epilogues (when possible).

@abrown
Copy link
Contributor

abrown commented Mar 4, 2021

@cfallin, if we want to still take a look at a performance comparison here, the new sightglass CLI should get us almost to the place where we can easily compare both compile times and run times. It supports a bunch of metrics now--cycles, instructions retired, cache hits/misses, etc. I think we need to talk about the final few pieces: do we need more benchmarks? How to aggregate the results? Essentially, what do we want to see from the tool to make this decision?

@cfallin
Copy link
Member

cfallin commented Mar 4, 2021

@abrown, this could be useful yes! However I think the higher-level question is: what are the thresholds for performance and do we want to have a hard-gate on them, or is this simply a nice-to-have input? (The above issue describes it as the former, but I think it would be worth getting recent input from folks.) In general my tests have shown the new backend to be faster (compile time and runtime) but we've seen large inputs where that's not the case; will we hold back the transition until we can address such issues (by e.g. working on regalloc.rs more) or will we move forward given the general balance of benefits vs remaining issues?

@cfallin
Copy link
Member

cfallin commented Mar 4, 2021

Greetings all -- so we've been checking off the remaining boxes needed to make the transition, but I haven't done a good job of keeping this up to date. We now have almost full feature-completeness (modulo a few things below) and several stakeholders -- Lucet (bytecodealliance/lucet#646) and rustc_codegen_cranelift (bjorn3/rustc_codegen_cranelift#1127 and bjorn3/rustc_codegen_cranelift#1140) have already switched to the new x86-64 backend, and Wasmtime is largest remaining stakeholder on the old x86-64 backend by default, to my knowledge.

Remaining work: implementation

There are a few tasks remaining before we can switch Wasmtime's backend:

Switching the default backend for the cranelift-codegen crate is technically a separate decision but IMHO can and should happen at the same time. To do so:

  • We need to update filetests: there are a number of golden-output tests that should be updated, and there are many tests that make assertions about old-backend-specific results (e.g. encodings) or using old-backend-specific functionality (e.g. regalloc constraints on CLIF values). A first pass at this would be to just label all tests that rely on the old backend with a feature flag, like experimental_x64 tests have now, and continue running them with the old backend on CI for now. We could then look at which of these should be ported, if we lack any test coverage on the new backend.
  • There are probably other unit tests throughout the codebase that make some assumptions based on the old backend's output.

Making the switch

My proposed course of action is to (i) finish the Windows-specific implementation, then (ii) put together a PR that makes the switch and look at where CI shows issues. A single PR can switch the default for Wasmtime, cranelift-codegen, and fix up unit tests/filetests at the same time.

We should talk about what happens to the old backend as well, though that can be a separate discussion after the default-switch occurs. (I can create an issue to track this when the switch occurs.) Until we remove it, we can maintain the ability to select it with a non-default feature, and we can continue to run CI tests for it, just as we run tests for the new backend today.

To make the switch, though, we should ensure that we have the appropriate sign-offs and that there will be no unforeseen issues. IMHO we should allow some time to hear from the community and consider any concerns or overlooked issues. Hopefully maintaining the ability to use the old backend at least in the medium-term will address any severe concerns, but switching the default does still have an impact that we want to consider.

Open questions

So, all that said, a few questions for folks here:

  • Are there any other issues I've missed above that require implementation before we switch?
  • Is there general agreement that (i) we should switch the default for cranelfit-codegen at the same time as Wasmtime, (ii) maintain the ability to use the old backend under a feature flag, and (iii) test the old backend on CI at least for a while?
  • Are there major stakeholders we should ping before this happens?
  • It may be good to use the BA RFC process to get a formal sign-off on the switch; I would tend toward this, to ensure everyone's on board. Do folks agree? If so, I'll go ahead and draft something (it will be pretty short as most of the work has already happened!)

@bnjbvr
Copy link
Member Author

bnjbvr commented Mar 5, 2021

We should talk about what happens to the old backend as well, though that can be a separate discussion after the default-switch occurs.

+1; in the short run, I strongly think it would be nice to keep the old backend around, for a release or two, behind a switch, so that if some users run into crashes/issues with the new backend, they have a way to revert it locally, report back the issues, so we can fix it before a following release (and in the absolute worst case, revert to the old backend).

Are there any other issues I've missed above that require implementation before we switch?

It would be really nice to have some performance numbers before/after, just so people can consider if 1. there are speedups/slowdowns, 2. if there are slowdowns, how bad/acceptable they are and should require blocking the transition.

Is there general agreement that (i) we should switch the default for cranelfit-codegen at the same time as Wasmtime, (ii) maintain the ability to use the old backend under a feature flag, and (iii) test the old backend on CI at least for a while?

Doing the opposite (i.e. switching default backend in Cranelift and in Wasmtime at different times) would only be valuable if there were a lot of Cranelift embedders who could report back to us in the time window between the two switches. Since we've collaborated closely with the cg_clif backend, which is probably the second largest user of Cranelift, and there aren't many others, I tend towards switching both at the same time.

It may be good to use the BA RFC process to get a formal sign-off on the switch; I would tend toward this, to ensure everyone's on board. Do folks agree? If so, I'll go ahead and draft something (it will be pretty short as most of the work has already happened!)

+1, this issue predated the existence of RFCs in the Bytecode Alliance, so it even could be moved in the RFC repository as a background discussion.

@cfallin
Copy link
Member

cfallin commented Apr 5, 2021

For the record, this discussion was continued and merged as an RFC in bytecodealliance/rfcs#10.

And, as of today, it was fixed in #2718 :-) Closing!

@cfallin cfallin closed this as completed Apr 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

No branches or pull requests

8 participants