A Formal C Memory Model Supporting Integer-Pointer Casts #143

nikomatsakis · 2016-09-21T14:25:05Z

Links: PDF, ACM

This is a memory model targeting C. The C specification itself has very loose rules that don't support a lot of common things people do in C programs, primarily around using pointers as integers (e.g., bitmasking, etc).

The key ideas of the model are as follows:

When you allocate memory, it is initially assigned a logical identifier and (conceptually) has no physical address (yet).
Once a pointer is cast to an integer, the memory block it points at is assigned a physical address.
Once a memory block has a physical address, then it is much less optimizable. For example, if you cast from an integer i to a pointer p, you have to assume that accesses through p could affect any block with a physical address, but you know they can't affect anything that doesn't have a physical address.

Some examples from the paper:

Keeping local variables private (supported)

Consider this question:

int f() {
  int a = 0;
  g(); // Can `g()` observe/affect value of `a`?
  return a;
}

In this model, the answer is no: the allocation of a is private, and at no point did the code do something like (int) &a to convert it into a concrete block.

Keeping local allocations private (supported)

Consider this question:

p = malloc();
*p = 123;
bar(); // Can this affect `*p`?
a = *p;
hash_put(h, p, a);

In this model, the answer is no: the allocation of p is private, and at no point did the code do something like (int) p to convert it into a concrete block.

The paper calls this "ownership transfer". I've avoided this term because it's so different from what Rust means by this term. It's more like "retaining" or "respecting" ownership (of p).

The paper says that clang -O2 does this kind of optimization, but not gcc -O2, and claims therefore that it may not be that important.

Dead Cast Elimination (unsupported in the model)

Consider this question (not from paper):

foo(int *p) {
    int x = (int) p; // Can we drop this unused value?
    println("%d", *p);
}

In this model, we cannot eliminate (int) p even if the result is not used because it has a side-effect of (potentially) assigning p a concrete value.

The paper addresses this by doing this optimization later in the pipeline. Basically at the point where we drop this model and convert to something more concrete.

Example limitation

Here is an example of an optimization prohibited by this approach.

p = malloc(1);
*p = 123;
b = (int) p;
bar();
a = *p;
hash_put(h, b, a); // Can we optimize this to `hash_put(h, b, 123)`?

The answer is no: p was made "concrete", and hence may be affected by bar(). Interestingly, a slight reordering would enable the optimization:

p = malloc(1);
*p = 123;
bar(); // `p` is not concrete **yet**...
b = (int) p;
a = *p;
hash_put(h, b, a); // ...hence no opportunity between cast and here to modify `*p`

The text was updated successfully, but these errors were encountered:

arielb1 · 2016-09-21T20:13:50Z

My notes:

First, keep in mind that most of the examples are equally valid when alloca (i.e. a local) is used instead of malloc, and therefore are important even if we don't care about optimizing malloc-heavy code.

Second, in Rust we probably want to have the physical addresses only be legal for limited periods of time. In fact, that basically results in a variant of the ACA model where all mallocs are asserted.

The paper says that clang -O2 does this kind of optimization, but not gcc -O2, and claims therefore that it may not be that important.

It might not be that important for performance, but LLVM will happily do the optimization for us, so it has to be a part of our memory model.

Ericson2314 · 2016-09-21T23:18:47Z

I need to read the paper more closely, but I wonder whether it was considered to do dataflow analysis on cast(ed) pointers---i.e. once a pointer is casted, only functions that get the cast-result integer or the original pointer are allowed/assumed to make arbitrary reads/writes to the block in which the pointer points.

(On the Rust side of things, I've similar thought about doing dataflow analysis on "poisoned" unsafe values.)

arielb1 · 2016-09-22T09:10:21Z

Memory models don't do dataflow. Also, any attempt at interesting analysis runs into the problem that optimizers will optimize x-x to 0.

jeehoonkang · 2016-09-22T13:06:42Z

Hello, I am an author of the paper, and would like to participate in this discussion. I am really glad that the Rust community is interested in the paper!

@Ericson2314 @arielb1 I would like to add my two cents on the dataflow analysis. The main reason we didn't do dataflow analysis is its complexity: it is really hard to track the leakage of pointer-to-integer casting information. Even, we have to consider not only explicit but also implicit leakage. For e.g. consider the following code (assume a pointer is 32bit):
```
  int *p = malloc(1<<31 + 1);
  uintptr_t pi = (uintptr_t) p;
  *((int *)(1 <<31)) = 42; // undefined behavior?
```
In this example, the information that 1 << 31 is a valid pointer is implicitly leaked to the store, even if the store is not directly related to the malloc and the pointer-to-integer cast. This example is really contrived and impractical, but it shows that it will be somewhat complicated to define the precise meaning of information leakage.
@arielb1 May I ask what is the ACA model? Google didn't give me useful information..
I would like to understand why the Rust community is interested in the casts between integers and pointers: isn't it (mem::transmute) contained in the unsafe Rust?
It is not directly related to this issue ;-) but I would like to advertise that recently my colleagues and I wrote a paper (under submission) on C/C++ relaxed-memory concurrency model: http://sf.snu.ac.kr/promise-concurrency/ Hope it also be interesting to the Rust community!

arielb1 · 2016-09-22T13:45:08Z

Thanks for coming to talk to us @jeehoonkang. We'll take a look at the relaxed-memory paper, but currently we have enough trouble getting the single-threaded semantics right. @orilahav's previous taming-acquire-release paper was excellent, so I hope this one is similarly good.

First, the ACA and capability models are our currently very-informal-and-incomplete candidate models (see rust-lang/rust-memory-model#26 and rust-lang/rust-memory-model#28). It would be nice if someone in the know came to look and them.

I don't like a "dataflow"-style model for the same reason as you. However, C has its odd "counterfactual" definition for restrict - a pointer expression E is said to be based on object P if ... modifying P ... would change the value of E - which is quite vague (did someone come up with some less-vague version?), and LLVM has its own set of vague noalias semantics, which we have to conform with.

I would like to understand why the Rust community is interested in the casts between integers and pointers: isn't it (mem::transmute) contained in the unsafe Rust?

That's the entire point of the memory model: If you keep to hereditarily-safe Rust, you could just have the "fully symbolic" memory model. Correctly-written unsafe Rust is supposed to be UB-free, and we are trying to find a good enough definition of UB that programmers, sanitizers and optimizers can agree on.

Also, in Rust, the widely-used raw pointers are pretty much interconvertible with integers (at the very least, int<->raw-ptr casts are safe and therefore must never be UB). The ACA model's idea is to handle raw pointers just like integers, which works moderately well.

If we can tame LLVM (it's not much use having a memory model that your optimizer misoptimizes by design!), I think that we can have something that handles raw pointers and integers separately.

RalfJung · 2016-09-22T14:00:58Z

I feel like we have to treat raw pointers and integers separately, because LLVM does. Yes, you can cast them, but for example LLVM feels free to reorder accesses to pointers it deems "noaliasing" even if, after casting them to integers, they compare equal.

jugglerchris · 2016-09-22T14:15:00Z

I'm a bit of an outsider, but I think it would be a shame if the rules for writing unsafe code ended up being compromised by the current LLVM (which I consider an implementation detail) at the expense of ease of understanding (for "unsafe" programmers) or possible future optimisations (for future Rust implementations, or updated LLVM).

gilhur · 2016-09-22T14:37:37Z

@arielb1 We have a formal model for "noalis" in LLVM that can be very well explained in our pointer-integer casting model (as long as our understanding of no alias is correct). I or Jeehoon will get back with more details tmr because it is midnight in Korea :)

arielb1 · 2016-09-22T14:39:24Z

I feel like we have to treat raw pointers and integers separately, because LLVM does. Yes, you can cast them, but for example LLVM feels free to reorder accesses to pointers it deems "noaliasing" even if, after casting them to integers, they compare equal.

You can compare pointers even without casting them to integers.

arielb1 · 2016-09-22T14:41:29Z

I'm a bit of an outsider, but I think it would be a shame if the rules for writing unsafe code ended up being compromised by the current LLVM (which I consider an implementation detail) at the expense of ease of understanding (for "unsafe" programmers) or possible future optimisations (for future Rust implementations, or updated LLVM).

Our memory model does need to be compatible with LLVM's or it won't be worth much.

jugglerchris · 2016-09-22T15:17:18Z

Our memory model does need to be compatible with LLVM's or it won't be worth much.

Yes, of course. Are changes/enhancements to LLVM to support a better overall Rust memory model an option that would be considered? I understand that resources are limited for such work.

sanxiyn · 2016-09-28T08:43:14Z

@gilhur Can you tell us more about this formal model for noalias in LLVM?

nikomatsakis · 2016-09-29T14:50:37Z

Just FYI, I took a stab at summarizing the "promising semantics" paper in #32, though I don't think I did it justice. =) Still digesting.

burdges · 2016-09-30T15:26:01Z

Interesting SOPS'13 paper on analyzing optimization-unstable code : Towards Optimization-Safe Systems: Analyzing the Impact of Undefined Behavior (SOSP'13) by Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando Solar-Lezama.

gilhur · 2016-10-04T09:58:54Z

@sanxiyn The formal model for noalias in LLVM essentially requires the notion of logical memory, which is the underlying model for CompCert and also for our ptr-int casting model. I am a bit hesitant to explain the idea here because we haven't published it yet. I will write a short tech report about it and give a link to it here. But please wait until the PLDI deadline in Nov because I am too busy until then.

Zoxc · 2017-07-23T13:48:22Z

@gilhur I would like some information about your model for noalias/pointer aliasing in LLVM, especially for things relating to pointer equality and pointer comparisons.

gilhur · 2017-07-28T17:28:52Z

@Zoxc We are currently working on developing an LLVM model for handling pointers including casting between pointers and integers, and pointer equality/comparison. We also update the LLVM compiler accordingly including the alias analysis module. Probably we can make some written documents available within a few months. I will notify you here when it is out.

JakobDegen · 2023-04-25T16:43:16Z

Discussed in backlog bonanza - closing. Memory models have evolved a lot since 2016, it's not clear that this issue brings a lot of value

RalfJung transferred this issue from rust-lang/rust-memory-model Jun 18, 2019

RalfJung added the C-related-work Category: Discussion of related work (other languages, scientific publications, ...) label Jun 18, 2019

JakobDegen closed this as completed Apr 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A Formal C Memory Model Supporting Integer-Pointer Casts #143

A Formal C Memory Model Supporting Integer-Pointer Casts #143

nikomatsakis commented Sep 21, 2016

arielb1 commented Sep 21, 2016

Ericson2314 commented Sep 21, 2016

arielb1 commented Sep 22, 2016

jeehoonkang commented Sep 22, 2016 •

edited

Loading

arielb1 commented Sep 22, 2016 •

edited

Loading

RalfJung commented Sep 22, 2016

jugglerchris commented Sep 22, 2016

gilhur commented Sep 22, 2016

arielb1 commented Sep 22, 2016 •

edited

Loading

arielb1 commented Sep 22, 2016

jugglerchris commented Sep 22, 2016

sanxiyn commented Sep 28, 2016

nikomatsakis commented Sep 29, 2016

burdges commented Sep 30, 2016 •

edited

Loading

gilhur commented Oct 4, 2016

Zoxc commented Jul 23, 2017

gilhur commented Jul 28, 2017

JakobDegen commented Apr 25, 2023

A Formal C Memory Model Supporting Integer-Pointer Casts #143

A Formal C Memory Model Supporting Integer-Pointer Casts #143

Comments

nikomatsakis commented Sep 21, 2016

Keeping local variables private (supported)

Keeping local allocations private (supported)

Dead Cast Elimination (unsupported in the model)

Example limitation

arielb1 commented Sep 21, 2016

Ericson2314 commented Sep 21, 2016

arielb1 commented Sep 22, 2016

jeehoonkang commented Sep 22, 2016 • edited Loading

arielb1 commented Sep 22, 2016 • edited Loading

RalfJung commented Sep 22, 2016

jugglerchris commented Sep 22, 2016

gilhur commented Sep 22, 2016

arielb1 commented Sep 22, 2016 • edited Loading

arielb1 commented Sep 22, 2016

jugglerchris commented Sep 22, 2016

sanxiyn commented Sep 28, 2016

nikomatsakis commented Sep 29, 2016

burdges commented Sep 30, 2016 • edited Loading

gilhur commented Oct 4, 2016

Zoxc commented Jul 23, 2017

gilhur commented Jul 28, 2017

JakobDegen commented Apr 25, 2023

jeehoonkang commented Sep 22, 2016 •

edited

Loading

arielb1 commented Sep 22, 2016 •

edited

Loading

arielb1 commented Sep 22, 2016 •

edited

Loading

burdges commented Sep 30, 2016 •

edited

Loading