-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cranelift: make CLIF behavior platform-independent w.r.t. endianness #3369
Comments
To make the proposal a bit more concrete, this would involve two changes:
|
Those two changes sound great to me! 👍 |
The clif ir has to be target dependent one way or another as you have to use the right pointer size. |
@cfallin maybe we can both make architecture features like byte order (or pointer size?) explicit in the IR and reduce the amount of changes to be introduced, by declaring global architecture properties just once in the IR, along the lines of how LLVM IR has a datalayout statement just once per file? |
Yes, that would probably be a part of the information too, like LLVM's
Maybe, though I do like the aspect of CLIF that all attributes are per-function currently (which I suppose arose from the parallel-compilation-compatible design of keeping all IR data per function). Purely to keep to that principle I think it might be simpler to have a |
Currently, CLIF has three kinds of endianness for memory loads and stores: big, little, and native. The meaning of a native-endian operation depends on the platform on which the CLIF executes.
The purpose of this three-option design, as we discussed in #2124, was to allow for convenience at the CLIF producer side: loads and stores that are meant to access platform-native values (such as pointers in a stack frame or data passed to and from code produced by other compilers) can simply use the "native" option, and the CLIF becomes parametric to endianness, working correctly on platforms of both endians.
It appears that, in the discussion in #2124, we initially (comment, comment) were leaning toward a strict two-option (big/little), always-explicit endianness flag on memory ops, but then it became apparent that this would require some more plumbing to know the endianness upfront.
The new forcing function that we have, however, is the CLIF interpreter. Because we now have an interpreter that is platform-independent, it becomes important to define what result a given CLIF execution should provide. It seems very important that this should be the same result regardless of the platform we happen to be running on. Otherwise, if a CLIF program can have multiple results depending on platform, then many other endianness issues could occur at higher levels of the system.
In essence, we're late-binding endianness, after the CLIF is produced. In contrast, other compilers, such as LLVM, use a form of early-binding: e.g., the data layout that is a part of a program in LLVM IR specifies the endianness assumed by the IR.
In this issue I'm suggesting that we consider doing the same: it would provide well-defined CLIF semantics, and shouldn't impact the ergonomics of most CLIF producers, requiring a bit more info when creating a builder (target platform) but then using the target's native endianness where "native" would have been used before.
One alternative is to disallow (i.e., declare to be undefined behavior) any CLIF that has a native-endian load/store interact with another access in a way that exposes endian-dependent behavior, but that seems much more problematic, because many real programs do this (e.g., Rust compiled via
cg_clif
can perfectly legally store au32
to memory and load its first byte). Another alternative is to bias the interpreter toward one endianness or another (e.g., the interpreter always behaves like a little-endian machine), but then the results differ between interpretation and native execution on opposite-endianness machines (e.g. big-endian), which is also undesirable.This is a continuation of the discussion in #3329; cc @uweigand @afonso360 @fitzgen and others. Thoughts?
The text was updated successfully, but these errors were encountered: