Cranelift: make CLIF behavior platform-independent w.r.t. endianness #3369

cfallin · 2021-09-17T22:09:09Z

Currently, CLIF has three kinds of endianness for memory loads and stores: big, little, and native. The meaning of a native-endian operation depends on the platform on which the CLIF executes.

The purpose of this three-option design, as we discussed in #2124, was to allow for convenience at the CLIF producer side: loads and stores that are meant to access platform-native values (such as pointers in a stack frame or data passed to and from code produced by other compilers) can simply use the "native" option, and the CLIF becomes parametric to endianness, working correctly on platforms of both endians.

It appears that, in the discussion in #2124, we initially (comment, comment) were leaning toward a strict two-option (big/little), always-explicit endianness flag on memory ops, but then it became apparent that this would require some more plumbing to know the endianness upfront.

The new forcing function that we have, however, is the CLIF interpreter. Because we now have an interpreter that is platform-independent, it becomes important to define what result a given CLIF execution should provide. It seems very important that this should be the same result regardless of the platform we happen to be running on. Otherwise, if a CLIF program can have multiple results depending on platform, then many other endianness issues could occur at higher levels of the system.

In essence, we're late-binding endianness, after the CLIF is produced. In contrast, other compilers, such as LLVM, use a form of early-binding: e.g., the data layout that is a part of a program in LLVM IR specifies the endianness assumed by the IR.

In this issue I'm suggesting that we consider doing the same: it would provide well-defined CLIF semantics, and shouldn't impact the ergonomics of most CLIF producers, requiring a bit more info when creating a builder (target platform) but then using the target's native endianness where "native" would have been used before.

One alternative is to disallow (i.e., declare to be undefined behavior) any CLIF that has a native-endian load/store interact with another access in a way that exposes endian-dependent behavior, but that seems much more problematic, because many real programs do this (e.g., Rust compiled via cg_clif can perfectly legally store a u32 to memory and load its first byte). Another alternative is to bias the interpreter toward one endianness or another (e.g., the interpreter always behaves like a little-endian machine), but then the results differ between interpretation and native execution on opposite-endianness machines (e.g. big-endian), which is also undesirable.

This is a continuation of the discussion in #3329; cc @uweigand @afonso360 @fitzgen and others. Thoughts?

The text was updated successfully, but these errors were encountered:

cfallin · 2021-09-17T22:12:28Z

To make the proposal a bit more concrete, this would involve two changes:

Require either a specific target, or at least an endianness, when creating a function or instruction builder. (Probably "full target" rather than "endianness" as the latter is a low-level detail to most users; and probably when creating the function rather than a particular builder.)
Make the load/store metadata support only the two endianness options.

fitzgen · 2021-09-17T22:21:14Z

Those two changes sound great to me! 👍

bjorn3 · 2021-09-18T05:52:14Z

The clif ir has to be target dependent one way or another as you have to use the right pointer size.

uweigand · 2021-09-20T11:45:56Z

@cfallin maybe we can both make architecture features like byte order (or pointer size?) explicit in the IR and reduce the amount of changes to be introduced, by declaring global architecture properties just once in the IR, along the lines of how LLVM IR has a datalayout statement just once per file?

cfallin · 2021-09-20T15:09:15Z

@bjorn3:

The clif ir has to be target dependent one way or another as you have to use the right pointer size.

Yes, that would probably be a part of the information too, like LLVM's DataLayout. (That said, varying pointer width is a different sort of nondeterminism concern than endianness because while changing endianness directly alters the semantics of loads/stores, changing pointer width just means that code with baked-in 32-bit-layout assumptions may overflow; but the semantics of each individual instruction are still well-defined. So from a "can the interpreter arrive at the one correct answer according to the semantics" perspective, it's not quite the same.)

@uweigand:

declaring global architecture properties just once in the IR, along the lines of how LLVM IR has a datalayout statement just once per file?

Maybe, though I do like the aspect of CLIF that all attributes are per-function currently (which I suppose arose from the parallel-compilation-compatible design of keeping all IR data per function). Purely to keep to that principle I think it might be simpler to have a big_endian / little_endian attribute and a pointer32 / pointer64 attribute (among others?) on the function itself.

akirilov-arm added the cranelift Issues related to the Cranelift code generator label Oct 1, 2021

cfallin added cranelift:area:clif enhancement labels May 4, 2022

cfallin added this to Cranelift: general cleanup and improvements May 10, 2022

cfallin mentioned this issue Aug 1, 2022

Vector register endianness causing ABI incompatibility #4566

Closed

afonso360 mentioned this issue Feb 26, 2023

cranelift-interpreter: Implement big and little endian memory accesses #5881

Closed

cfallin mentioned this issue Feb 27, 2023

Implement the relaxed SIMD proposal #5892

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cranelift: make CLIF behavior platform-independent w.r.t. endianness #3369

Cranelift: make CLIF behavior platform-independent w.r.t. endianness #3369

cfallin commented Sep 17, 2021

cfallin commented Sep 17, 2021

fitzgen commented Sep 17, 2021

bjorn3 commented Sep 18, 2021

uweigand commented Sep 20, 2021

cfallin commented Sep 20, 2021

Cranelift: make CLIF behavior platform-independent w.r.t. endianness #3369

Cranelift: make CLIF behavior platform-independent w.r.t. endianness #3369

Comments

cfallin commented Sep 17, 2021

cfallin commented Sep 17, 2021

fitzgen commented Sep 17, 2021

bjorn3 commented Sep 18, 2021

uweigand commented Sep 20, 2021

cfallin commented Sep 20, 2021