Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranelift: make CLIF behavior platform-independent w.r.t. endianness #3369

Open
cfallin opened this issue Sep 17, 2021 · 5 comments
Open

Cranelift: make CLIF behavior platform-independent w.r.t. endianness #3369

cfallin opened this issue Sep 17, 2021 · 5 comments
Labels
cranelift:area:clif cranelift Issues related to the Cranelift code generator enhancement

Comments

@cfallin
Copy link
Member

cfallin commented Sep 17, 2021

Currently, CLIF has three kinds of endianness for memory loads and stores: big, little, and native. The meaning of a native-endian operation depends on the platform on which the CLIF executes.

The purpose of this three-option design, as we discussed in #2124, was to allow for convenience at the CLIF producer side: loads and stores that are meant to access platform-native values (such as pointers in a stack frame or data passed to and from code produced by other compilers) can simply use the "native" option, and the CLIF becomes parametric to endianness, working correctly on platforms of both endians.

It appears that, in the discussion in #2124, we initially (comment, comment) were leaning toward a strict two-option (big/little), always-explicit endianness flag on memory ops, but then it became apparent that this would require some more plumbing to know the endianness upfront.

The new forcing function that we have, however, is the CLIF interpreter. Because we now have an interpreter that is platform-independent, it becomes important to define what result a given CLIF execution should provide. It seems very important that this should be the same result regardless of the platform we happen to be running on. Otherwise, if a CLIF program can have multiple results depending on platform, then many other endianness issues could occur at higher levels of the system.

In essence, we're late-binding endianness, after the CLIF is produced. In contrast, other compilers, such as LLVM, use a form of early-binding: e.g., the data layout that is a part of a program in LLVM IR specifies the endianness assumed by the IR.

In this issue I'm suggesting that we consider doing the same: it would provide well-defined CLIF semantics, and shouldn't impact the ergonomics of most CLIF producers, requiring a bit more info when creating a builder (target platform) but then using the target's native endianness where "native" would have been used before.

One alternative is to disallow (i.e., declare to be undefined behavior) any CLIF that has a native-endian load/store interact with another access in a way that exposes endian-dependent behavior, but that seems much more problematic, because many real programs do this (e.g., Rust compiled via cg_clif can perfectly legally store a u32 to memory and load its first byte). Another alternative is to bias the interpreter toward one endianness or another (e.g., the interpreter always behaves like a little-endian machine), but then the results differ between interpretation and native execution on opposite-endianness machines (e.g. big-endian), which is also undesirable.

This is a continuation of the discussion in #3329; cc @uweigand @afonso360 @fitzgen and others. Thoughts?

@cfallin
Copy link
Member Author

cfallin commented Sep 17, 2021

To make the proposal a bit more concrete, this would involve two changes:

  • Require either a specific target, or at least an endianness, when creating a function or instruction builder. (Probably "full target" rather than "endianness" as the latter is a low-level detail to most users; and probably when creating the function rather than a particular builder.)
  • Make the load/store metadata support only the two endianness options.

@fitzgen
Copy link
Member

fitzgen commented Sep 17, 2021

Those two changes sound great to me! 👍

@bjorn3
Copy link
Contributor

bjorn3 commented Sep 18, 2021

The clif ir has to be target dependent one way or another as you have to use the right pointer size.

@uweigand
Copy link
Member

@cfallin maybe we can both make architecture features like byte order (or pointer size?) explicit in the IR and reduce the amount of changes to be introduced, by declaring global architecture properties just once in the IR, along the lines of how LLVM IR has a datalayout statement just once per file?

@cfallin
Copy link
Member Author

cfallin commented Sep 20, 2021

@bjorn3:

The clif ir has to be target dependent one way or another as you have to use the right pointer size.

Yes, that would probably be a part of the information too, like LLVM's DataLayout. (That said, varying pointer width is a different sort of nondeterminism concern than endianness because while changing endianness directly alters the semantics of loads/stores, changing pointer width just means that code with baked-in 32-bit-layout assumptions may overflow; but the semantics of each individual instruction are still well-defined. So from a "can the interpreter arrive at the one correct answer according to the semantics" perspective, it's not quite the same.)

@uweigand:

declaring global architecture properties just once in the IR, along the lines of how LLVM IR has a datalayout statement just once per file?

Maybe, though I do like the aspect of CLIF that all attributes are per-function currently (which I suppose arose from the parallel-compilation-compatible design of keeping all IR data per function). Purely to keep to that principle I think it might be simpler to have a big_endian / little_endian attribute and a pointer32 / pointer64 attribute (among others?) on the function itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:clif cranelift Issues related to the Cranelift code generator enhancement
Development

No branches or pull requests

5 participants