-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mixed-atomicity memory model: only relaxed reads #42
base: master
Are you sure you want to change the base?
Conversation
Thanks for working on this. Assuming that the definition below is descriptive,
I do not understand why the implementation section says
If relaxed reads are compiled exactly like non-atomic reads, why is it that a relaxed read on an atomic location guaranteed not to see older values? In particular, on Arm, we use the compilation scheme in Table 5(b), where the loads are compiled to plain loads. In the semantics, atomic locations only hold one/current value, and so the semantics does not even have a way to talk about older values. However, without any synchronization, why are we guaranteed to see this most recent write? |
I know very little about the compilation-scheme side of weak memory models, so it is very possible that (if I understand your point correctly) the proposed model cannot be implemented efficiently in practice. I can think of other models that are reasonably natural, and allow relaxed-read to observe older values. In these models, atomic locations carry histories which are timestamp-indexed sets of not just values, but (frontier, value) pairs. Would one of the informal description below sound reasonable to implement to you ?
(I can easily try to work out a formal description of either approaches, and see what happens.) Remarks:
|
Thanks for the update! This model makes more sense to me, but I agree with @kayceesrk that it would need a bit more implementation work than the RFC suggests. In particular, a program consisting entirely of So the options then are to weaken the model or strengthen the implementation until they line up. To decide between these, it might be helpful to look carefully at some intended uses of Atomic.get_relaxed to see what ordering requirements they actually have, and what the cost of fencing is.
Right, agreed, the various convenient properties of the current OCaml model are not strict requirements. It might end up being a good idea to break some of these properties to get better performance in some cases, but I want to make sure we don't do so by accident. Having said that, the property of coherence of atomics is generally considered an important one for writing programs - it's one of the few guarantees made even by C++ relaxed atomics. |
If we take the view that we want to compile relaxed reads as cheaply as possible (as in kcas library where A relaxed read may observe any (frontier, value) pair that is at or after its own frontier-timestamp for this location This is similar to non-atomic reads, where no frontier is updated. This might be acceptable as the uses of relaxed read are expected to be at places where you do cheap reads until you observe a value that you care about, and then you'd follow this with an atomic operation to synchronise. |
One property that your proposal does not have, if I understand correctly, is coherence as formulated by @stedolan: if I do a relaxed read at some newer timestamp, I don't learn anything new in my frontier, so if I do another relaxed read I may now observe an older value. (This is what I called an "amnesic behavior" above.) I am fine with this, I can write a precise description of the model to have something concrete to discuss, and I believe that current practitioners do indeed want to have the cheapest possible read rather than stronger guarantees, but @stedolan points out that coherence is an important property and that the proposed model (if I understand correctly) would be strictly weaker than C11 relaxed atomics. |
(If we don't update our frontier at all, I don't think we need to have a frontier at each timestamp, so I can simplify the presentation a bit: atomic locations are modelled by a (frontier, history) pair, where atomic reads always observe the latest value in history and synchronize the frontier, and non-atomic reads observe any value at their own frontier or later and don't synchronize anything.) |
If I understand correctly @stedolan’s comments, the underlying hardware is coherent, so observation of values with non-monotonic timestamps can only be due to compiler optimisations. Isn’t there a world where we have maximally cheap |
I think that we can find all existing uses of The use-cases that I understand well enough are those were a relaxed read follows, or is followed by, an unconditional compare-and-set operation. I think that those would work well in basically any model, in the sense that they should behave like an atomic read -- there is a discussions of this in the text of the RFC. For example: (in general kcas.ml is full of comments about why "fenceless_get" is safe, generally the justification is that we know there was a fence before or there will be one after.) Other use-cases that are more mysterious to me are when there is a conditional check on the relaxed-read return value, and an atomic write/cas only in one of the branches. For example, the following is part of a hashtable implementation that supports resizing concurrently with other operations: I don't understand the code enough to guess whether the caller makes any synchronization assumption about the result of this function. |
The second code example is indeed interesting in that no synchronization follows the relaxed read. I guess the reasoning would be that publication safety holds, even for non-atomics. So if the relaxed read returns
I would be ok with the amnesic behaviour as well.
I'm a bit confused here as Stephen suggested that coherence is broken by mixing atomic and relaxed writes to the same atomic location. Here we're choosing to only do relaxed reads. Given that non-atomic reads do exhibit amenesic behaviour, I would argue that it is ok for relaxed reads to do the same. Are we overlooking something @stedolan? |
|
I don't follow how dependency ordering leads to a cheap implementation of fenceless_get. Can you explain?
I think this would be useful. Unlike the memory model presented in the paper, implementation has to deal with 3 different cases for writes to non-atomic locations -- initialising writes, non-initialising writes to integer fields, and non-initialising writes to pointer fields (
Do you mean the case where
I am interested in thinking about how to formally specify this. Linux Kernel Memory Model has a
can be optimised to
which is not what was intended. ocaml/ocaml#12713 also has a similar flavour to the example above. There, the compiler had a bug -- full CSE on atomic locations led to more behaviours than what was permitted by the memory model. However, optimising the initial example by eliminating redundant load is permitted by the model3 but does not match with the intention. Given that the notion of a compiler barrier does not exist in the model today, and it is not possible currently to express even the intended semantics for atomics formally, should a compiler barrier for relaxed read be necessarily a pre-condition for including that feature in the compiler? Informally, we can say that the compiler will not optimise any operations on atomic locations, including relaxed read. Footnotes |
In that case the Also, the runtime / compiler does not currently expose a sequentially consistent atomic read from an array, which is why Also, note that every operation on the hash table begins with an |
Thanks (everyone and) @polytypic for your clarifications here and the other thread, this is very helpful. |
I am rather saying that, in low level terms, if you want to avoid all fences, then you should still be able to rely on dependency ordering (since you already do for memory safety), and conversely there is no other synchronization principle available (AFAIU).
This sort of case indeed, since the block initialized to Val_unit is not memory-safe in general. (We are also leaving aside the fact that the allocator itself is written in C, etc.).
This makes sense if it is not possible/desirable to state the liveness property formally (and even then the compiler barrier is more of an implementation detail than something in the model). For the record the informal liveness property in the C/C++ memory model is the following: "The implementation should make atomic stores visible to atomic loads, and atomic loads should observe atomic stores, within a reasonable amount of time." |
In my role as a scribe / operational semanticist, here is my attempt to translate the different models that I think are easy/natural to describe starting from the current operational semantics. "latest, no sync"As in the current PR: relaxed reads always access the sequentially-latest value, but they don't synchronize at all.
"older, sync"Relaxed reads may access an older value, but they synchronize with its writer (and all sequentially-before writers):
(Invariant: the frontiers are monotonically increasing in atomic histories. The same invariant also holds for the proposal below.) "older, loc sync only"Relaxed reads may access older values, and they only update the frontier for that one atomic location -- the minimal amount to rule out amnesic behaviors.
"older, no sync"Relaxed reads may access older values, with no synchronization at all. (This allows amnesic behaviors, even in absence of relaxed-write.)
|
My current understanding from your implementation-side discussions is as follows:
|
Thanks Gabriel. An important question here is which of these models can we actually implement. It may not be possible to have precisely the guarantees that are offered by these models in hardware and we're likely to be "at least as strong as" the operational model. From a performance perspective, I do think we need "older, no sync", which can be implemented as you have pointed out. My question/confusion about about coherence from an earlier message (#42 (comment)) still remains (I think):
|
We discussed this at the maintainer meeting today. @stedolan clarified that his remark on reading old values later is not a criticism of the proposal itself, but rather a remark to point out that it is weaker than relaxed reads in the C11 memory model. (So they are "very relaxed" reads in this sense.) The next step is for me to update the RFC with the "older, no sync" model, so that we have a clear document. Then someone who knows about this stuff should think about the whether the proposed implementation, of compiling those very-relaxed reads exactly like non-atomic reads, is in fact correct with respect to the memory model. |
Rendered.
This is a revision of the general model proposed in #41, where only non-atomic/relaxed reads are permitted on atomic locations, not relaxed writes. The model is simpler, and avoids the issues pointed out by @stedolan in #41 -- but it may still have problems!