-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A Formal C Memory Model Supporting Integer-Pointer Casts #143
Comments
My notes: First, keep in mind that most of the examples are equally valid when Second, in Rust we probably want to have the physical addresses only be legal for limited periods of time. In fact, that basically results in a variant of the ACA model where all mallocs are asserted.
It might not be that important for performance, but LLVM will happily do the optimization for us, so it has to be a part of our memory model. |
I need to read the paper more closely, but I wonder whether it was considered to do dataflow analysis on cast(ed) pointers---i.e. once a pointer is casted, only functions that get the cast-result integer or the original pointer are allowed/assumed to make arbitrary reads/writes to the block in which the pointer points. (On the Rust side of things, I've similar thought about doing dataflow analysis on "poisoned" unsafe values.) |
Memory models don't do dataflow. Also, any attempt at interesting analysis runs into the problem that optimizers will optimize |
Hello, I am an author of the paper, and would like to participate in this discussion. I am really glad that the Rust community is interested in the paper!
|
Thanks for coming to talk to us @jeehoonkang. We'll take a look at the relaxed-memory paper, but currently we have enough trouble getting the single-threaded semantics right. @orilahav's previous taming-acquire-release paper was excellent, so I hope this one is similarly good. First, the ACA and capability models are our currently very-informal-and-incomplete candidate models (see rust-lang/rust-memory-model#26 and rust-lang/rust-memory-model#28). It would be nice if someone in the know came to look and them. I don't like a "dataflow"-style model for the same reason as you. However, C has its odd "counterfactual" definition for restrict - a pointer expression
That's the entire point of the memory model: If you keep to hereditarily-safe Rust, you could just have the "fully symbolic" memory model. Correctly-written unsafe Rust is supposed to be UB-free, and we are trying to find a good enough definition of UB that programmers, sanitizers and optimizers can agree on. Also, in Rust, the widely-used raw pointers are pretty much interconvertible with integers (at the very least, int<->raw-ptr casts are safe and therefore must never be UB). The ACA model's idea is to handle raw pointers just like integers, which works moderately well. If we can tame LLVM (it's not much use having a memory model that your optimizer misoptimizes by design!), I think that we can have something that handles raw pointers and integers separately. |
I feel like we have to treat raw pointers and integers separately, because LLVM does. Yes, you can cast them, but for example LLVM feels free to reorder accesses to pointers it deems "noaliasing" even if, after casting them to integers, they compare equal. |
I'm a bit of an outsider, but I think it would be a shame if the rules for writing unsafe code ended up being compromised by the current LLVM (which I consider an implementation detail) at the expense of ease of understanding (for "unsafe" programmers) or possible future optimisations (for future Rust implementations, or updated LLVM). |
@arielb1 We have a formal model for "noalis" in LLVM that can be very well explained in our pointer-integer casting model (as long as our understanding of no alias is correct). I or Jeehoon will get back with more details tmr because it is midnight in Korea :) |
You can compare pointers even without casting them to integers. |
Our memory model does need to be compatible with LLVM's or it won't be worth much. |
Yes, of course. Are changes/enhancements to LLVM to support a better overall Rust memory model an option that would be considered? I understand that resources are limited for such work. |
@gilhur Can you tell us more about this formal model for noalias in LLVM? |
Just FYI, I took a stab at summarizing the "promising semantics" paper in #32, though I don't think I did it justice. =) Still digesting. |
Interesting SOPS'13 paper on analyzing optimization-unstable code : Towards Optimization-Safe Systems: Analyzing the Impact of Undefined Behavior (SOSP'13) by Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando Solar-Lezama. |
@sanxiyn The formal model for noalias in LLVM essentially requires the notion of logical memory, which is the underlying model for CompCert and also for our ptr-int casting model. I am a bit hesitant to explain the idea here because we haven't published it yet. I will write a short tech report about it and give a link to it here. But please wait until the PLDI deadline in Nov because I am too busy until then. |
@gilhur I would like some information about your model for noalias/pointer aliasing in LLVM, especially for things relating to pointer equality and pointer comparisons. |
@Zoxc We are currently working on developing an LLVM model for handling pointers including casting between pointers and integers, and pointer equality/comparison. We also update the LLVM compiler accordingly including the alias analysis module. Probably we can make some written documents available within a few months. I will notify you here when it is out. |
Discussed in backlog bonanza - closing. Memory models have evolved a lot since 2016, it's not clear that this issue brings a lot of value |
Links: PDF, ACM
This is a memory model targeting C. The C specification itself has very loose rules that don't support a lot of common things people do in C programs, primarily around using pointers as integers (e.g., bitmasking, etc).
The key ideas of the model are as follows:
i
to a pointerp
, you have to assume that accesses throughp
could affect any block with a physical address, but you know they can't affect anything that doesn't have a physical address.Some examples from the paper:
Keeping local variables private (supported)
Consider this question:
In this model, the answer is no: the allocation of
a
is private, and at no point did the code do something like(int) &a
to convert it into a concrete block.Keeping local allocations private (supported)
Consider this question:
In this model, the answer is no: the allocation of
p
is private, and at no point did the code do something like(int) p
to convert it into a concrete block.The paper calls this "ownership transfer". I've avoided this term because it's so different from what Rust means by this term. It's more like "retaining" or "respecting" ownership (of
p
).The paper says that
clang -O2
does this kind of optimization, but notgcc -O2
, and claims therefore that it may not be that important.Dead Cast Elimination (unsupported in the model)
Consider this question (not from paper):
In this model, we cannot eliminate
(int) p
even if the result is not used because it has a side-effect of (potentially) assigningp
a concrete value.The paper addresses this by doing this optimization later in the pipeline. Basically at the point where we drop this model and convert to something more concrete.
Example limitation
Here is an example of an optimization prohibited by this approach.
The answer is no:
p
was made "concrete", and hence may be affected bybar()
. Interestingly, a slight reordering would enable the optimization:The text was updated successfully, but these errors were encountered: