-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Transitioning to the new backend by default #1936
Comments
Subscribe to Label Actioncc @bnjbvr
This issue or pull request has been labeled: "cranelift"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
Must haveThe main thing missing in all new style backends is 128bit integer support. It is very very hard to support 128bit integers in cg_clif without Cranelift support and it is impossible to even compile libcore without 128bit integer support in cg_clif. edit: TLS support is also essential for multi-threading support. Nice to haveOther than that having System-V struct argument support (#1559 for the old style x86 backend) would be really nice. That PR is not yet merged though, but supporting proc-macros depends on it to prevent an abi incompatibility between cg_clif (the proc macro) and cg_llvm (the rustc loading the proc-macro).
Those are nice convenience functions. Maybe they could just be that functions that emit a |
One possibility for 128 bit math is to add an i128 type to CLIR, and let the individual target instruction selectors lower them down to the relevant sequences of machine instructions. I know this isn't the "traditional" way that CL currently uses. There's an argument to be made that having condition codes exposed in the CLIR -- as required by the old-backend handling for multiword arithmetic -- is more trouble than it's worth. Exposed condition codes mean we have to have extra logic to ensure we don't put any flag-setting insns in between generation and use; they are non-register-allocatable, and we have to jump through hoops to try and avoid materialising them unnecessarily in registers. |
There is already an |
I forgot to add TLS support to the must have section. Added it now. |
Thanks for writing this up @bnjbvr! Very thorough. One thing not mentioned explicitly, and which Wasmtime relies upon, is support for the target's native calling conventions, as opposed to SpiderMonkey's. |
Sorry that I forgot to respond on this. The Lucet team can start running tests against the new backend once x86_64 support is sufficient for Wasm MVP support. The lucet-wasi-fuzz crate uses csmith and clang to explore a subset of Wasm programs. This approach has found some bugs in Lucet before but I don't believe it has ever discovered Cranelift bugs. We can apply it to the new backend and see if anything shakes out. |
I'll be actively working on this; for tracking purposes, this depends on at least #2079, #2372, #2272, Since the issue was created, we've got reftypes support in the Cranelift backend itself, though not in the |
To generate DWARF's .debug_info, you will also need |
@cfallin, if we want to still take a look at a performance comparison here, the new sightglass CLI should get us almost to the place where we can easily compare both compile times and run times. It supports a bunch of metrics now--cycles, instructions retired, cache hits/misses, etc. I think we need to talk about the final few pieces: do we need more benchmarks? How to aggregate the results? Essentially, what do we want to see from the tool to make this decision? |
@abrown, this could be useful yes! However I think the higher-level question is: what are the thresholds for performance and do we want to have a hard-gate on them, or is this simply a nice-to-have input? (The above issue describes it as the former, but I think it would be worth getting recent input from folks.) In general my tests have shown the new backend to be faster (compile time and runtime) but we've seen large inputs where that's not the case; will we hold back the transition until we can address such issues (by e.g. working on regalloc.rs more) or will we move forward given the general balance of benefits vs remaining issues? |
Greetings all -- so we've been checking off the remaining boxes needed to make the transition, but I haven't done a good job of keeping this up to date. We now have almost full feature-completeness (modulo a few things below) and several stakeholders -- Lucet (bytecodealliance/lucet#646) and rustc_codegen_cranelift (bjorn3/rustc_codegen_cranelift#1127 and bjorn3/rustc_codegen_cranelift#1140) have already switched to the new x86-64 backend, and Wasmtime is largest remaining stakeholder on the old x86-64 backend by default, to my knowledge. Remaining work: implementationThere are a few tasks remaining before we can switch Wasmtime's backend:
Switching the default backend for the
Making the switchMy proposed course of action is to (i) finish the Windows-specific implementation, then (ii) put together a PR that makes the switch and look at where CI shows issues. A single PR can switch the default for Wasmtime, cranelift-codegen, and fix up unit tests/filetests at the same time. We should talk about what happens to the old backend as well, though that can be a separate discussion after the default-switch occurs. (I can create an issue to track this when the switch occurs.) Until we remove it, we can maintain the ability to select it with a non-default feature, and we can continue to run CI tests for it, just as we run tests for the new backend today. To make the switch, though, we should ensure that we have the appropriate sign-offs and that there will be no unforeseen issues. IMHO we should allow some time to hear from the community and consider any concerns or overlooked issues. Hopefully maintaining the ability to use the old backend at least in the medium-term will address any severe concerns, but switching the default does still have an impact that we want to consider. Open questionsSo, all that said, a few questions for folks here:
|
+1; in the short run, I strongly think it would be nice to keep the old backend around, for a release or two, behind a switch, so that if some users run into crashes/issues with the new backend, they have a way to revert it locally, report back the issues, so we can fix it before a following release (and in the absolute worst case, revert to the old backend).
It would be really nice to have some performance numbers before/after, just so people can consider if 1. there are speedups/slowdowns, 2. if there are slowdowns, how bad/acceptable they are and should require blocking the transition.
Doing the opposite (i.e. switching default backend in Cranelift and in Wasmtime at different times) would only be valuable if there were a lot of Cranelift embedders who could report back to us in the time window between the two switches. Since we've collaborated closely with the cg_clif backend, which is probably the second largest user of Cranelift, and there aren't many others, I tend towards switching both at the same time.
+1, this issue predated the existence of RFCs in the Bytecode Alliance, so it even could be moved in the RFC repository as a background discussion. |
For the record, this discussion was continued and merged as an RFC in bytecodealliance/rfcs#10. And, as of today, it was fixed in #2718 :-) Closing! |
TL;DR: what would it take to deprecate the “old” backend in favor of the “new” backend for code generation?
Context
The compilation pipeline in Cranelift currently does instruction selection (through legalizations) before optimizing the intermediate representation (IR), applying register allocation on it, and then generating the machine code. From the point of view of Cranelift, these last steps can be seen as a “backend” that generates machine code for different target architectures.
The previous backend was a bit complicated to work with: it was using the DSL from the codegen/meta crate, with concepts hard to approach and explain (like Recipes), it generated Rust code that could get out of sync with the non-meta crate or contain compile errors, etc. (see also #1141). A decision was made to work on a new backend (sometimes referred to as the “machinst” backend). This was presented in #1174 and has landed since then, as an alternative backend (viz., in addition to the existing one).
As of today, the old backend supports generating machine code for (some subset of) RISC-V, x86 64 bits and 32 bits. The new backend supports generating machine code for aarch64, and has a work-in-progress backend for x86_64. The duplication of x86_64 in both the old and new backends imply that both backends move ahead in parallel. This makes it harder for the new backend to catch up with the old one as new features are being added, and can generate frustration as different teams with different priorities work on different backends.
The Mozilla Spidermonkey team has enough confidence in the new backend, which we consider to be pleasant to work with (developer ergonomics), fast enough for our use case (both compile-time and generated code throughput), and it has the potential for more compile-time and code quality optimizations in the long run, so we think it is a good time to start this discussion.
The proposal
We propose that at some point in the future, we entirely move away from the “old” backend (that is, remove it, as well as all the associated code in the meta language), and use the “new” backend, for all target architectures, and that the only possible way to implement a new target is to do it through the new backend. Notably, since x86 is the main target in the old backend, this means removing the old x86 backend.
Of course, this can’t be done until all the primarily involved stakeholders are satisfied with this idea and don’t have any strong objections in moving forward. This RFC is a first step at identifying what the acceptance criterias would be to make it possible to transition, and what a plan would be to make this realistic.
What this is not about: removing the entire meta language. This may or may not be done in the future (if we want to do it, then moving over to the new backend is a first step).
Acceptance criteria
Note that these criteria are not definite and could evolve over time, based on our discussions here.
Feel free to comment about other things that are important to you, and please explain why (if it is not obvious)! Good criteria tend to be objectively quantifiable, measurable and/or bimodal (done or not done).
Potential additions to this list
These are additions to the above list, and need to be discussed as a group:
Proposed planning
Step 1: agree on the proposal
This is the current step that’s being done as part of this issue. See below.
Step 2: get to a point where we can try the new backend in real-world settings
Once we get to a point where we can compile code for large wasm programs mostly using wasm MVP features, we’ll be able to do a performance analysis, comparing on the two axis presented above. This will give us confidence in how fast we can move forward with this plan, or if we should revisit some implementation decisions, and chase more performance first.
Step 3: finish implementation of remaining features
This means implementing all the Features mentioned in the above list of criteria, as well as passing tests from all the test suites. At this point, we could put up an official depreciation notice for the old backend, and encourage people to use the new backend in general.
Step 4: do a final approval and switch
Based on an evaluation of performance, as well as feedback from the different stakeholders, we can eventually decide to enable the new backend by default. Removal of the code supporting the old backend may or may not happen at the same time; deferring its removal for a short period of time allows to switch the default back to the old backend, in case of unexpected consequences.
Future work
There is future work that is going to be enabled by switching to the new backend. At this point, these are mostly ideas, and it is not the point of this issue to discuss the design / feasibility / interest aspects of these ideas.
Thoughts?
If you have any comments, questions, alternative proposals, objections, please feel free to write them down here. Note that we’re looking for consensus here, which is gained not when everybody agrees about all the details, but when nobody has strong objections anymore. So please carefully discuss objections and assume good intent from everyone involved in the process :-) Thanks!
The text was updated successfully, but these errors were encountered: