Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Formal C Memory Model Supporting Integer-Pointer Casts #143

Closed
nikomatsakis opened this issue Sep 21, 2016 · 18 comments
Closed

A Formal C Memory Model Supporting Integer-Pointer Casts #143

nikomatsakis opened this issue Sep 21, 2016 · 18 comments
Labels
C-related-work Category: Discussion of related work (other languages, scientific publications, ...)

Comments

@nikomatsakis
Copy link
Contributor

Links: PDF, ACM

This is a memory model targeting C. The C specification itself has very loose rules that don't support a lot of common things people do in C programs, primarily around using pointers as integers (e.g., bitmasking, etc).

The key ideas of the model are as follows:

  • When you allocate memory, it is initially assigned a logical identifier and (conceptually) has no physical address (yet).
  • Once a pointer is cast to an integer, the memory block it points at is assigned a physical address.
  • Once a memory block has a physical address, then it is much less optimizable. For example, if you cast from an integer i to a pointer p, you have to assume that accesses through p could affect any block with a physical address, but you know they can't affect anything that doesn't have a physical address.

Some examples from the paper:

Keeping local variables private (supported)

Consider this question:

int f() {
  int a = 0;
  g(); // Can `g()` observe/affect value of `a`?
  return a;
}

In this model, the answer is no: the allocation of a is private, and at no point did the code do something like (int) &a to convert it into a concrete block.

Keeping local allocations private (supported)

Consider this question:

p = malloc();
*p = 123;
bar(); // Can this affect `*p`?
a = *p;
hash_put(h, p, a);

In this model, the answer is no: the allocation of p is private, and at no point did the code do something like (int) p to convert it into a concrete block.

The paper calls this "ownership transfer". I've avoided this term because it's so different from what Rust means by this term. It's more like "retaining" or "respecting" ownership (of p).

The paper says that clang -O2 does this kind of optimization, but not gcc -O2, and claims therefore that it may not be that important.

Dead Cast Elimination (unsupported in the model)

Consider this question (not from paper):

foo(int *p) {
    int x = (int) p; // Can we drop this unused value?
    println("%d", *p);
}

In this model, we cannot eliminate (int) p even if the result is not used because it has a side-effect of (potentially) assigning p a concrete value.

The paper addresses this by doing this optimization later in the pipeline. Basically at the point where we drop this model and convert to something more concrete.

Example limitation

Here is an example of an optimization prohibited by this approach.

p = malloc(1);
*p = 123;
b = (int) p;
bar();
a = *p;
hash_put(h, b, a); // Can we optimize this to `hash_put(h, b, 123)`?

The answer is no: p was made "concrete", and hence may be affected by bar(). Interestingly, a slight reordering would enable the optimization:

p = malloc(1);
*p = 123;
bar(); // `p` is not concrete **yet**...
b = (int) p;
a = *p;
hash_put(h, b, a); // ...hence no opportunity between cast and here to modify `*p`
@arielb1
Copy link

arielb1 commented Sep 21, 2016

My notes:

First, keep in mind that most of the examples are equally valid when alloca (i.e. a local) is used instead of malloc, and therefore are important even if we don't care about optimizing malloc-heavy code.

Second, in Rust we probably want to have the physical addresses only be legal for limited periods of time. In fact, that basically results in a variant of the ACA model where all mallocs are asserted.

The paper says that clang -O2 does this kind of optimization, but not gcc -O2, and claims therefore that it may not be that important.

It might not be that important for performance, but LLVM will happily do the optimization for us, so it has to be a part of our memory model.

@Ericson2314
Copy link

I need to read the paper more closely, but I wonder whether it was considered to do dataflow analysis on cast(ed) pointers---i.e. once a pointer is casted, only functions that get the cast-result integer or the original pointer are allowed/assumed to make arbitrary reads/writes to the block in which the pointer points.

(On the Rust side of things, I've similar thought about doing dataflow analysis on "poisoned" unsafe values.)

@arielb1
Copy link

arielb1 commented Sep 22, 2016

Memory models don't do dataflow. Also, any attempt at interesting analysis runs into the problem that optimizers will optimize x-x to 0.

@jeehoonkang
Copy link

jeehoonkang commented Sep 22, 2016

Hello, I am an author of the paper, and would like to participate in this discussion. I am really glad that the Rust community is interested in the paper!

  • @Ericson2314 @arielb1 I would like to add my two cents on the dataflow analysis. The main reason we didn't do dataflow analysis is its complexity: it is really hard to track the leakage of pointer-to-integer casting information. Even, we have to consider not only explicit but also implicit leakage. For e.g. consider the following code (assume a pointer is 32bit):

      int *p = malloc(1<<31 + 1);
      uintptr_t pi = (uintptr_t) p;
      *((int *)(1 <<31)) = 42; // undefined behavior?
    

    In this example, the information that 1 << 31 is a valid pointer is implicitly leaked to the store, even if the store is not directly related to the malloc and the pointer-to-integer cast. This example is really contrived and impractical, but it shows that it will be somewhat complicated to define the precise meaning of information leakage.

  • @arielb1 May I ask what is the ACA model? Google didn't give me useful information..

  • I would like to understand why the Rust community is interested in the casts between integers and pointers: isn't it (mem::transmute) contained in the unsafe Rust?

  • It is not directly related to this issue ;-) but I would like to advertise that recently my colleagues and I wrote a paper (under submission) on C/C++ relaxed-memory concurrency model: http://sf.snu.ac.kr/promise-concurrency/ Hope it also be interesting to the Rust community!

@arielb1
Copy link

arielb1 commented Sep 22, 2016

Thanks for coming to talk to us @jeehoonkang. We'll take a look at the relaxed-memory paper, but currently we have enough trouble getting the single-threaded semantics right. @orilahav's previous taming-acquire-release paper was excellent, so I hope this one is similarly good.

First, the ACA and capability models are our currently very-informal-and-incomplete candidate models (see rust-lang/rust-memory-model#26 and rust-lang/rust-memory-model#28). It would be nice if someone in the know came to look and them.

I don't like a "dataflow"-style model for the same reason as you. However, C has its odd "counterfactual" definition for restrict - a pointer expression E is said to be based on object P if ... modifying P ... would change the value of E - which is quite vague (did someone come up with some less-vague version?), and LLVM has its own set of vague noalias semantics, which we have to conform with.

I would like to understand why the Rust community is interested in the casts between integers and pointers: isn't it (mem::transmute) contained in the unsafe Rust?

That's the entire point of the memory model: If you keep to hereditarily-safe Rust, you could just have the "fully symbolic" memory model. Correctly-written unsafe Rust is supposed to be UB-free, and we are trying to find a good enough definition of UB that programmers, sanitizers and optimizers can agree on.

Also, in Rust, the widely-used raw pointers are pretty much interconvertible with integers (at the very least, int<->raw-ptr casts are safe and therefore must never be UB). The ACA model's idea is to handle raw pointers just like integers, which works moderately well.

If we can tame LLVM (it's not much use having a memory model that your optimizer misoptimizes by design!), I think that we can have something that handles raw pointers and integers separately.

@RalfJung
Copy link
Member

I feel like we have to treat raw pointers and integers separately, because LLVM does. Yes, you can cast them, but for example LLVM feels free to reorder accesses to pointers it deems "noaliasing" even if, after casting them to integers, they compare equal.

@jugglerchris
Copy link

I'm a bit of an outsider, but I think it would be a shame if the rules for writing unsafe code ended up being compromised by the current LLVM (which I consider an implementation detail) at the expense of ease of understanding (for "unsafe" programmers) or possible future optimisations (for future Rust implementations, or updated LLVM).

@gilhur
Copy link

gilhur commented Sep 22, 2016

@arielb1 We have a formal model for "noalis" in LLVM that can be very well explained in our pointer-integer casting model (as long as our understanding of no alias is correct). I or Jeehoon will get back with more details tmr because it is midnight in Korea :)

@arielb1
Copy link

arielb1 commented Sep 22, 2016

I feel like we have to treat raw pointers and integers separately, because LLVM does. Yes, you can cast them, but for example LLVM feels free to reorder accesses to pointers it deems "noaliasing" even if, after casting them to integers, they compare equal.

You can compare pointers even without casting them to integers.

@arielb1
Copy link

arielb1 commented Sep 22, 2016

I'm a bit of an outsider, but I think it would be a shame if the rules for writing unsafe code ended up being compromised by the current LLVM (which I consider an implementation detail) at the expense of ease of understanding (for "unsafe" programmers) or possible future optimisations (for future Rust implementations, or updated LLVM).

Our memory model does need to be compatible with LLVM's or it won't be worth much.

@jugglerchris
Copy link

Our memory model does need to be compatible with LLVM's or it won't be worth much.

Yes, of course. Are changes/enhancements to LLVM to support a better overall Rust memory model an option that would be considered? I understand that resources are limited for such work.

@sanxiyn
Copy link
Member

sanxiyn commented Sep 28, 2016

@gilhur Can you tell us more about this formal model for noalias in LLVM?

@nikomatsakis
Copy link
Contributor Author

Just FYI, I took a stab at summarizing the "promising semantics" paper in #32, though I don't think I did it justice. =) Still digesting.

@burdges
Copy link

burdges commented Sep 30, 2016

Interesting SOPS'13 paper on analyzing optimization-unstable code : Towards Optimization-Safe Systems: Analyzing the Impact of Undefined Behavior (SOSP'13) by Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando Solar-Lezama.

@gilhur
Copy link

gilhur commented Oct 4, 2016

@sanxiyn The formal model for noalias in LLVM essentially requires the notion of logical memory, which is the underlying model for CompCert and also for our ptr-int casting model. I am a bit hesitant to explain the idea here because we haven't published it yet. I will write a short tech report about it and give a link to it here. But please wait until the PLDI deadline in Nov because I am too busy until then.

@Zoxc
Copy link

Zoxc commented Jul 23, 2017

@gilhur I would like some information about your model for noalias/pointer aliasing in LLVM, especially for things relating to pointer equality and pointer comparisons.

@gilhur
Copy link

gilhur commented Jul 28, 2017

@Zoxc We are currently working on developing an LLVM model for handling pointers including casting between pointers and integers, and pointer equality/comparison. We also update the LLVM compiler accordingly including the alias analysis module. Probably we can make some written documents available within a few months. I will notify you here when it is out.

@RalfJung RalfJung transferred this issue from rust-lang/rust-memory-model Jun 18, 2019
@RalfJung RalfJung added the C-related-work Category: Discussion of related work (other languages, scientific publications, ...) label Jun 18, 2019
@JakobDegen
Copy link
Contributor

Discussed in backlog bonanza - closing. Memory models have evolved a lot since 2016, it's not clear that this issue brings a lot of value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-related-work Category: Discussion of related work (other languages, scientific publications, ...)
Projects
None yet
Development

No branches or pull requests