|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "WebAssembly targets and new on-by-default features" |
| 4 | +author: Alex Crichton |
| 5 | +--- |
| 6 | + |
| 7 | +The Rust compiler has [recently upgraded to using LLVM 19][llvm19] and this |
| 8 | +change accompanies some updates to WebAssembly targets of the Rust compiler. |
| 9 | +Nightly Rust, what will be come Rust 1.82 on 2024-10-17, reflects all of these |
| 10 | +changes and can be used for testing. |
| 11 | + |
| 12 | +WebAssembly is an evolving standard where new features are being added over time |
| 13 | +through a [proposals process][proposals]. As WebAssembly proposals reach |
| 14 | +maturity, get merged into the specification itself, get implemented in engines, |
| 15 | +and remains this way for quite some time then producer toolchains (e.g. LLVM) |
| 16 | +are going to update to include these new proposals by default. In LLVM 19 this |
| 17 | +has happened with the [multi-value and reference-types proposals][llvmenable]. |
| 18 | +These are now enabled by default in LLVM and transitively means that it's |
| 19 | +enabled by default for Rust as well. |
| 20 | + |
| 21 | +WebAssembly targets for Rust now [have improved |
| 22 | +documentation](https://github.com/rust-lang/rust/pull/128511) about WebAssembly |
| 23 | +features and disabling them, and this post is going to review these changes and |
| 24 | +go into depth about what's changing in LLVM. |
| 25 | + |
| 26 | +## Enabling Reference Types by Default |
| 27 | + |
| 28 | +The [reference-types proposal to |
| 29 | +WebAssembly](https://github.com/webAssembly/reference-types) introduced a few |
| 30 | +new concepts to WebAssembly, notably the `externref` type which is a |
| 31 | +host-defined GC resource that WebAssembly cannot access but can pass around. |
| 32 | +Rust does not have support for the WebAssembly `externref` type and LLVM 19 does |
| 33 | +not change that. WebAssembly modules produced from Rust will continue to not use |
| 34 | +the `externref` type nor have a means of being able to do so. |
| 35 | + |
| 36 | +Also included in the reference-types proposal, however, was the ability to have |
| 37 | +multiple WebAssembly tables in a single module. In the original version of the |
| 38 | +WebAssembly specification only a single table was allowed and this restriction |
| 39 | +was relaxed with the reference-types proposal. WebAssembly tables are used by |
| 40 | +LLVM and Rust to implement indirect function calls. For example function |
| 41 | +pointers in WebAssembly are actually table indices and indirect function calls |
| 42 | +are a WebAssembly `call_indirect` instruction with this table index. |
| 43 | + |
| 44 | +With the reference-types proposal the binary encoding of `call_indirect` |
| 45 | +instructions was updated. Prior to the reference-types proposal `call_indirect` |
| 46 | +was encoded with a fixed zero byte in its instruction (required to be exactly |
| 47 | +0x00). This fixed zero byte was relaxed to a 32-bit [LEB] to indicate which |
| 48 | +table the `call_indirect` instruction was using. For those unfamiliar [LEB] is a |
| 49 | +way of encoding multi-byte integers in a smaller number of bytes for smaller |
| 50 | +integers. For example the integer 0 can be encoded as `0x00` with a [LEB]. |
| 51 | +[LEB]s are flexible to additionally allow "overlong" encodings so the integer 0 |
| 52 | +can additionally be encoded as `0x80 0x00`. |
| 53 | + |
| 54 | +LLVM's support of separate compilation of source code to a WebAssembly binary |
| 55 | +means that when an object file is emitted it does not know the final index of |
| 56 | +the table that is going to be used in the final binary. Before reference-types |
| 57 | +there was only one option, table 0, so `0x00` was always used when encoding |
| 58 | +`call_indirect` instructions. After reference-types, however, LLVM will emit an |
| 59 | +over-long [LEB] of the form `0x80 0x80 0x80 0x80 0x00` which is the maximal |
| 60 | +length of a 32-bit [LEB]. This [LEB] is then filled in by the linker with a |
| 61 | +relocation to the actual table index that is used by the final module. |
| 62 | + |
| 63 | +When putting all of this together it means that LLVM 19, which has |
| 64 | +reference-types enabled by default, then any WebAssembly module with an indirect |
| 65 | +function call (which is almost always the case for Rust code) will produce a |
| 66 | +WebAssembly binary that cannot be decoded by engines and tooling that do not |
| 67 | +support the reference-types proposal. It is expected that this change will have |
| 68 | +a low impact due to the age of the reference-types proposal and breadth of |
| 69 | +implementation in engines. Given the multitude of WebAssembly engines, however, |
| 70 | +it's recommended that any WebAssembly users test out Nightly Rust and see if |
| 71 | +the produced module still runs on the engine of choice. |
| 72 | + |
| 73 | +### LLVM, Rust, and Multiple Tables |
| 74 | + |
| 75 | +One interesting point worth mentioning is that despite reference-types enabling |
| 76 | +multiple tables in WebAssembly modules this is not actually taken advantage of |
| 77 | +at this time by either LLVM or Rust. WebAssembly modules emitted will still have |
| 78 | +at most one table of functions. This means that the over-long 5-byte encoding of |
| 79 | +index 0 as `0x80 0x80 0x80 0x80 0x00` is not actually necessary at this time. |
| 80 | +LLD, LLVM's linker for WebAssembly, wants to process all [LEB] relocations in a |
| 81 | +similar manner which currently forces this 5-byte encoding of zero. For example |
| 82 | +when a function calls another function the `call` instruction encodes the target |
| 83 | +function index as a 5-byte [LEB] which is filled in by the linker. There is |
| 84 | +quite often more than one function so the 5-byte encoding enables all possible |
| 85 | +function indices to be encoded. |
| 86 | + |
| 87 | +In the future LLVM might start using multiple tables as well. For example LLVM |
| 88 | +may have a mode in the future where there's a table-per-function type instead of |
| 89 | +a single heterogenous table. This can enable engines to implement |
| 90 | +`call_indirect` more efficiently. This is not implemented at this time, however. |
| 91 | + |
| 92 | +For users who want a minimally-sized WebAssembly module (e.g. if you're in a web |
| 93 | +context and sending bytes over the wire) it's recommended to use an optimization |
| 94 | +tool such as [`wasm-opt`] to shrink the size of the output of LLVM. Even before |
| 95 | +this change with reference-types it's recommended to do this as [`wasm-opt`] can |
| 96 | +typically optimize LLVM's default output even further. When optimizing a module |
| 97 | +through [`wasm-opt`] these 5-byte encodings of index 0 are all shrunk to a |
| 98 | +single byte. |
| 99 | + |
| 100 | +## Enabling Multi-Value by Default |
| 101 | + |
| 102 | +The second feature enabled by default in LLVM 19 is multi-value. The |
| 103 | +[multi-value proposal to WebAssembly][multi-value] enables functions to have |
| 104 | +more than one return value for example. WebAssembly instructions are |
| 105 | +additionally allowed to have more than one return value as well. This proposal |
| 106 | +is one of the first to get merged into the WebAssembly specification after the |
| 107 | +original MVP and has been implemented in many engines for quite some time. |
| 108 | + |
| 109 | +The consequences of enabling this feature by default in LLVM are more minor for |
| 110 | +Rust, however, than enabling reference-types by default. LLVM's default ABI for |
| 111 | +WebAssembly code is not changing even when multi-value is enabled. Additionally |
| 112 | +Rust's ABI is not changing either and continues to match LLVM's. Despite this |
| 113 | +though the change has the possibility of still affecting Nightly users of Rust. |
| 114 | + |
| 115 | +Rust for some time has supported an `extern "wasm"` ABI on Nightly which was an |
| 116 | +experimental means of exposing the ability of defining a function in Rust which |
| 117 | +returned multiple values (e.g. used the multi-value proposal). Due to |
| 118 | +infrastructural changes and refactorings in LLVM itself this feature of Rust has |
| 119 | +[been removed](https://github.com/rust-lang/rust/pull/127605) and is no longer |
| 120 | +supported on Nightly at all. As a result there is no longer any possible method |
| 121 | +of writing a function in Rust that returns multiple values at the WebAssembly |
| 122 | +function type level. |
| 123 | + |
| 124 | +In summary this change is expected to not affect any Rust code in the wild |
| 125 | +unless you were using the Nightly feature of `extern "wasm"` in which case |
| 126 | +you'll be forced to drop support for that and use `extern "C"` instead. |
| 127 | +Supporting WebAssembly multi-return functions in Rust is a broader topic than |
| 128 | +this post can cover, but at this time it's an area that's ripe for contribution |
| 129 | +from suitably motivated contributors. |
| 130 | + |
| 131 | +## Enabling Future Proposals to WebAssembly |
| 132 | + |
| 133 | +This is not the first time that a WebAssembly proposal has gone from |
| 134 | +off-by-default to on-by-default in LLVM, nor will it be the last. For example |
| 135 | +LLVM already enables the [sign-extension proposal][sign-ext] by default which |
| 136 | +MVP WebAssembly did not have. It's expected that in the not-too-distant future |
| 137 | +the |
| 138 | +[nontrapping-fp-to-int](https://github.com/WebAssembly/nontrapping-float-to-int-conversions) |
| 139 | +proposal will likely be enabled by default. These changes are currently not made |
| 140 | +with strict criteria in mind (e.g. N engines must have this implemented for M |
| 141 | +years), and there may be breakage that happens. |
| 142 | + |
| 143 | +If you're using a WebAssembly engine that does not support the modules emitted |
| 144 | +by Nightly Rust and LLVM 19 then your options are: |
| 145 | + |
| 146 | +* Try seeing if the engine you're using has any updates available to it. You |
| 147 | + might be using an older version which didn't support a feature but a newer |
| 148 | + version supports the feature. |
| 149 | +* Open an issue to raise awareness that a change is causing breakage. This could |
| 150 | + either be done on your engine's repository, the Rust repository, or the |
| 151 | + WebAssembly |
| 152 | + [tool-conventions](https://github.com/WebAssembly/tool-conventions) |
| 153 | + repository. |
| 154 | +* Recompile your code with features disabled, more on this in the next section. |
| 155 | + |
| 156 | +The general assumption behind enabling new features by default is that it's a |
| 157 | +relatively hassle-free operation for end users while bringing performance |
| 158 | +benefits for everyone (e.g. nontrapping-fp-to-int will make float-to-int |
| 159 | +conversions more optimal). If updates end up causing hassle it's best to flag |
| 160 | +that early on so rollout plans can be adjusted if needed. |
| 161 | + |
| 162 | +## Disabling on-by-default WebAssembly proposals |
| 163 | + |
| 164 | +For a variety of reasons you might be motivated to disable on-by-default |
| 165 | +WebAssembly features: for example maybe your engine is difficult to update or |
| 166 | +doesn't support a new feature. Disabling on-by-default features is unfortunately |
| 167 | +not the easiest task. It is notably not sufficient to use |
| 168 | +`-Ctarget-features=-foo` to disable features for just your own project's |
| 169 | +compilation because the Rust standard library, shipped in precompiled form, is |
| 170 | +compiled with this features enabled. |
| 171 | + |
| 172 | +To disable on-by-default WebAssembly proposal it's required that you use Cargo's |
| 173 | +[`-Zbuild-std`](https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std) |
| 174 | +feature. For example: |
| 175 | + |
| 176 | +```shell |
| 177 | +$ export RUSTFLAGS=-Ctarget-cpu=mvp |
| 178 | +$ cargo +nightly build -Zbuild-std=panic_abort,std --target wasm32-unknown-unknown |
| 179 | +``` |
| 180 | + |
| 181 | +This will recompiled the Rust standard library in addition to your own code with |
| 182 | +the "MVP CPU" which is LLVM's placeholder for all WebAssembly proposals |
| 183 | +disabled. This will disable sign-ext, reference-types, multi-value, etc. |
| 184 | + |
| 185 | +[llvm19]: https://github.com/rust-lang/rust/pull/127513 |
| 186 | +[proposals]: https://github.com/WebAssembly/proposals |
| 187 | +[llvmenable]: https://github.com/llvm/llvm-project/pull/80923 |
| 188 | +[LEB]: https://en.wikipedia.org/wiki/LEB128 |
| 189 | +[`wasm-opt`]: https://github.com/WebAssembly/binaryen |
| 190 | +[multi-value]: https://github.com/webAssembly/multi-value |
| 191 | +[sign-ext]: https://github.com/webAssembly/sign-extension-ops |
0 commit comments