The Mu memory model basically follows the C11 memory model with a few modifications to make it suitable for Mu.
Mu does not enforce any strong order, but trusts the client to correctly use the atomic and ordering mechanisms provided by Mu. Many choices of ordering, from relaxed to sequentially consistent, on each memory operation are given to the client. The client has the freedom to make its choice and has the responsibility to synchronise their multi-threaded programs.
The most restricted form of memory accesses are sequentially consistent. Mu guarantees that they are atomic. They follow the well-known acquire-release memory model and there is a total order of all such memory accesses in all threads.
A less-restricted form is acquire and release. It does not have the total order provided by sequential consistency, but atomicity and the synchronise-with relationship between release and acquire operations are provided.
The "consume" order exploits the fact that some processors with relaxed memory order can figure out the dependencies between load operations in the hardware and will not reorder them even without memory fences. It can achieve some synchronisation requirements more efficiently than the "acquire" order.
The "relaxed" order only guarantees atomicity but does not enforce any order. The most unrestricted form of memory access is not atomic. These operations allows the Mu implementation and the processor to maximise the throughput while relying on the programmer to correctly synchronise their programs.
The program order in Mu is a total order while the sequenced-before relationship in C is a partial order, because there are unspecified order of evaluations in C.
There is no "atomic type" in Mu. Only operations make a difference between atomic and non-atomic accesses. Using both atomic and non-atomic operations on the same memory location is an undefined behaviour in Mu.
The primitives for atomic accesses and fences are provided by the instruction set of Mu rather than the library. Mutex locks, however, have to be implemented on top of this memory model.
LLVM does not guarantee that all atomic writes to a memory location has a total
order unless using monotonic
or stronger memory order. It provides an
unordered
order which is atomic but not "monotonic". The unordered
order
is intended to support the Java memory model, but whether unordered
is
necessary and whether the relaxed
order in C11 or the monotonic
in LLVM
is suitable for Java is not yet known.
In LLVM, an atomic operation can be labelled singlethread
, in which case it
only synchronises with or participates in modification and seq_cst
total
orderings with other operations running in the same thread (for example, in
signal handlers). C11 provides atomic_signal_fence
for similar purposes.
- data value
- See Type System
- SSA variable, instruction and evaluation
- See Instruction Set
- memory, initial value, load, store, access and conflict
- See Mu and the Memory
- thread
- A thread is the unit of CPU scheduling. In this memory model, threads include but are not limited to Mu threads. See Threads and Stacks for the definition of Mu threads.
- stack, stack binding, stack unbinding, swap-stack
- See Threads and Stacks
- futex, futex_wait, futex_wake
- See Threads and Stacks
The following table is a approximate comparison and may not strictly apply.
C | Mu |
---|---|
value | data value |
expression | SSA variable |
object | memory location |
memory location | memory location of scalar type |
(N/A) | object |
read | load |
modify | store |
Operations include (but are not limited to) the following:
- load
- A memory load. May be atomic or not.
- store
- A memory store. May be atomic or not.
- atomic read-modify-write
- A load and (maybe conditionally) a store as one atomic action. It may contain both a load and a store operation, but may have special atomic properties.
- fence
- A fence introduces memory orders.
- stack binding
- Binding a thread to a stack.
- stack unbinding
- Unbinding a thread from a stack.
- swap-stack
- Unbinding a thread from a stack and bind that thread to another stack.
- futex wait
- Waiting on a memory location.
- futex wake
- Wake up threads waiting on a memory location.
- external operation
- Any other operation that may affect the state outside Mu.
NOTE: Unlike the Java Memory Model, Mu memory model does not contain locks.
Some instructions and API functions perform memory operations. Specifically,
- The
LOAD
instruction and theload
API function perform a load operation. - The
STORE
instruction and thestore
API function perform a store operation. - The
CMPXCHG
instruction and thecmpxchg
API function perform a compare-exchange operation, which is a kind of atomic read-modify-write operation. - The
ATOMICRMW
instruction and theatomicrmw
API function perform an atomic read-modify-write operation. - The
FENCE
instruction and thefence
API function are a fence. - A concrete implementation may have other ways to perform those instructions.
NOTE: Programs in other languages (e.g. native programs or any other language a Mu implementation can interface with) can synchronise with Mu in an implementation-specific way. But the implementation must guarantee that those programs perform those operations in a way compatible with Mu.
For example, there are more than one way to implement loads and stores of the SEQ_CST order (either put fences in the load or in the store). If the implementation interfaces with a C implementation (e.g. gcc+glibc+Linux+x86_64), then Mu should do the same thing as (or be compatible with) the C program.
Load, store, atomic read-modify-write operations and fences have memory orders, which are the following:
- NOT_ATOMIC
- RELAXED
- CONSUME
- ACQUIRE
- RELEASE
- ACQ_REL (acquire and release)
- SEQ_CST (sequentially consistent)
All accesses that are not NOT_ATOMIC are atomic. Using both non-atomic operations and atomic operations on the same memory location is an undefined behaviour.
- Load shall have NOT_ATOMIC, RELAXED, CONSUME, ACQUIRE or SEQ_CST order.
- Store shall have NOT_ATOMIC, RELAXED, RELEASE or SEQ_CST order.
- Compare-exchange shall have RELAXED, ACQUIRE, RELEASE, ACQ_REL or SEQ_CST on success and RELAXED, ACQUIRE or SEQ_CST on failure.
- Other atomic read-modify-write operations shall have RELAXED, ACQUIRE, RELEASE, ACQ_REL or SEQ_CST order.
- Fence shall have ACQUIRE, RELEASE, ACQ_REL or SEQ_CST order.
Order | LOAD | STORE | CMPXCHG(succ) | CMPXCHG(fail) | ATOMICRMW | FENCE |
---|---|---|---|---|---|---|
NOT_ATOMIC | yes | yes | no | no | no | no |
RELAXED | yes | yes | yes | yes | yes | no |
CONSUME | yes | no | no | no | no | no |
ACQUIRE | yes | no | yes | yes | yes | yes |
RELEASE | no | yes | yes | no | yes | yes |
ACQ_REL | no | no | yes | no | yes | yes |
SEQ_CST | yes | yes | yes | yes | yes | yes |
- A load operation with ACQUIRE, ACQ_REL or SEQ_CST order performs a acquire operation on its specified memory location.
- A load operation with CONSUME order performs a consume operation on its specified memory location.
- A store operation with RELEASE, ACQ_REL or SEQ_CST order performs a release operation on its specified memory location.
- A fence with ACQUIRE, ACQ_REL or SEQ_CST order is a acquire fence.
- A fence with RELEASE, ACQ_REL or SEQ_CST order is a release fence.
All evaluations performed by a Mu thread form a total order, in which the operations performed by each evaluation are sequenced before operations performed by its successor.
All operations performed by a Mu client via a particular client context of the API form a total order, in which each operation is sequenced before its successor.
Operations before a TRAP
or WATCHPOINT
are sequenced before the
operations in the trap handler. Operations in the trap handler are sequenced
before operations after that TRAP
or WATCHPOINT
.
The program order contains operations and their "sequenced before" relations.
NOTE: This means all Mu instructions plus all client operations done by the trap handler in a Mu thread still forms a total order.
In C, the program order is a partial order even in a single thread because of unspecified order of evaluations.
All atomic store operations on a particular memory location M occur in some particular total order, called the modification order of M. If A and B are atomic stores on memory location M, and A happens before B, then A shall precede B in the modification order of M.
NOTE: This is to say, the modification order is consistent with the happens before order.
NOTE: This reflects the mechanisms, including cache coherence, provided by some hardware that guarantees such a total order.
A release sequence headed by a release operation A on a memory location M is a maximal contiguous sub-sequence of atomic store operations in the modification order M, where the first operation is A and every subsequent operation either is performed by the same thread that performed the release or is an atomic read-modify-write operation.
NOTE: In Mu, when a memory location is accessed by both atomic and non-atomic operations, it is an undefined behaviour. So the release sequence only apply for memory locations only accessed by atomic operations.
NOTE: Intuitively, there is a invisible fence before a release store (which is sometimes actually implemented as this). Seeing a store in the release sequence should imply seeing stores before the invisible fence.
An evaluation A synchronises with another evaluation B if:
- A performs a release operation on memory location M, and, B performs an acquire operation on M, and, sees a value stored by an operation in the release sequence headed by A, or
- A is a release fence, and, B is an acquire fence, and, there exist atomic operations X and Y, both operating on some memory location M, such that A is sequenced before X, X store into M, Y is sequenced before B, and Y sees the value written by X or a value written by any store operation in the hypothetical release sequence X would head if it were a release operation, or
- A is a release fence, and, B is an atomic operation that performs an acquire operation on a memory location M, and, there exists an atomic operation X such that A is sequenced before X, X stores into M, and B sees the value written by X or a value written by any store operations in the hypothetical release sequence X would head if it were a release operation, or
- A is an atomic operation that performs a release operation on M, and, B is an acquire fence, and, there exists some atomic operation X on M such that X is sequenced before B and sees the value written by A or a value written by any side effect in the release sequence headed by A, or
- A is the creation of a thread and B is the beginning of the execution of the new thread.
- A is a futex wake operation and B is the next operation after the futex wait operation of the thread woken up by A.
NOTE: A thread can be created by the
NEWTHREAD
instruction or thenew_thread
API function.NOTE: Since there is no explicit heap memory management in Mu, the "synchronises with" relation in C involving
free
andrealloc
does not apply in Mu.NOTE: Mu only provides very primitive threading support. The "synchronises with" relations involving
call_once
andthrd_join
are not in the memory model, but can be implemented on a higher level.NOTE: The "synchronises with" relation between the futex wake and wait is necessary to ensure the visibility of values written by one thread to be visible immediately by the woken thread. If such relation does not exist, the woken thread may never see the memory change made by the other thread. For example:
// C pseudo code int shared_var = 42; atomic_int futex = 0; thread1 { shared_var = 43; futex = 1; // Op1 futex_wake(&futex); // Op2 } thread2 { while(futex == 0) { // Op4 futex_wait(&futex, 0); // Op3 } int local_var = shared_var; }If the "synchronises with" between Op2 and Op3 does not exist, then Op4 may never see the value written by Op1, and thread2 will loop indefinitely.
An evaluation A carries a dependency to another evaluation B, or B carries a dependency from A, if:
- the data value of A is used as a data argument of B unless:
- A is used in the
KEEPALIVE
clause of B, or - B is a
SELECT
instruction and A is itscond
argument or a is theiftrue
oriffalse
argument not selected bycond
, or - A is a comparing or
INSERTVALUE
instruction, or - B is a
@uvm.kill_dependency
,CALL
,EXTRACTVALUE
orCCALL
instruction, or
- A is used in the
- there is a store operation X such that A is sequenced before X and X is sequenced before B, and, X stores the value of A to a memory location M, and, B performs a load operation from M, or
- for some evaluation X, A carries a dependency to X and X carries a dependency to B.
NOTE: The "carries a dependency to" relation together with the "dependency-ordered before"" relation exploits the fact that some processors, notably ARM and POWER, will not reorder load operations if the address used in the later in the program order depends on the result of the earlier load. On such processors, the earlier load can be implemented as an ordinary load without fences and still has "consume" semantic.
NOTE: Processors including ARM and POWER only respects data dependency, not control dependency. The
SELECT
instruction and the comparing instruction are usually implemented by conditional moves or conditional flags, which would end up that the result is control-dependent on the argument rather than data dependent.NOTE: Operations involving
struct
types in Mu may be implemented as no-ops. Consider the following:.typedef @i64 = int<64> .const @I64_0 <@i64> = 0 .type @A = struct <@i64 @i64> .const @A_ZERO <@A> = {@I64_0 @I64_0} %v = LOAD CONSUME <@i64> %some_memory_location %x = INSERTVALUE <@A 0> @A_ZERO %v // {%v 0} %y = EXTRACTVALUE <@A 0> %x // %v %z = EXTRACTVALUE <@A 1> %x // 0Mu can alias
%y
with%v
in the machine code, but%z
is always a constant zero.NOTE: Dependencies may not always be carried across function calls. A function may return a constant and it is uncertain if any processor respect this order.
An evaluation A is dependency-ordered before another evaluation B if any of the following is true:
- A performs a release operation on a memory location M, and, in another thread, B performs a consume operation on M and sees a value stored by any store operations in the release sequence headed by A.
- For some evaluation X, A is dependency-ordered before X and X carries a dependency to B.
NOTE: The "dependency-ordered before" relation consists of a release/consume pair followed by zero or more "carries a dependency to" relations. If the consume sees the value of (or "later than") the release operation, then subsequent loads that depends on the consume operation should also see values stored before the release operation.
TODO: The "carries a dependency to" relation is not well-defined for the client since it may be written in a different language.
An evaluation A inter-thread happens before an evaluation B if A synchronises with B, A is dependency-ordered before B, or, for some evaluation X:
- A synchronises with X and X is sequenced before B,
- A is sequenced before X and X inter-thread happens before B, or
- A inter-thread happens before X and X inter-thread happens before B.
NOTE: This basically allows any concatenations of "synchronises with", "dependency-ordered before" and "sequenced before" relations, but disallows ending with a "dependency-ordered before" relation followed by a "sequenced before" relation. It is disallowed because the consume load in the "dependency-ordered before" relation only respects later loads that works with a location that depends on the consume load, not arbitrary loads sequenced after it. It is only disallowed in the end because the release operation in a "synchronises with" relation or a "dependency-ordered before" relation will force the order between it and any preceding operations.
NOTE: A sequence of purely "sequenced before" is not "inter-thread" and is also not allowed in the "inter-thread happens before" relation.
An evaluation A happens before an evaluation B if A is sequenced before B or A inter-thread happens before B.
A load operation B from a memory location M shall see the initial value of M, the value stored by a store operation A sequenced before B, or other permitted values defined later.
A visible store operation A to a memory location M with respect to a load operation B from M satisfies the conditions:
- A happens before B, and
- there is no other store operation X to M such that A happens before X and X happens before B.
A non-atomic load operation B from memory location M shall see the value stored by the visible store operation A.
NOTE: If there is ambiguity about which store operation is visible to a non-atomic load operation, then there is a data race and the behaviour is undefined.
The visible seqeunce of atomic store operations to a memory location M with respect to an atomic load operation B from M, is a maximal contiguous sub-sequence of atomic store operations in the modification order of M, where the first operation is visible with respect to B, and for every subsequent operation, it is not the case that B happens before it. The atomic load operation B sees the value stored by some atomic load operation in the visible sequence M. Furthermore, if an atomic load operation A from memory location M happens before an atomic load operation B from M, and A sees a value stored by an atomic store operation X, then the value B sees shall either equal the value seen by A, or be the value stored by an atomic store operation Y, where Y follows X in the modification order of M.
NOTE: This means, a load cannot see the value stored by an operation happens after it, or a store operation separated by another store in the happen-before relation. Furthermore, the later operation of two loads cannot see an earlier value than that seen by the first load.
The execution of a program contains a data race if it contains two conflicting non-atomic memory accesses in different threads, neither happens before the other. Any such data race results in undefined behaviour.
NOTE: Using both atomic and non-atomic accesses on the same memory location is already an undefined behaviour, whether in the same thread of not.
There shall be a single total order S on all SEQ_CST operations, consistent with the "happens before" order and modification orders for all affected memory locations, such that each SEQ_CST load operation B from memory location M sees one of the following values:
- the result of the last store operation A that precedes B in S, if it exists, or
- if A exists, the result of some store operation to M in the visible sequence of atomic store operations with respect to B that is not SEQ_CST and does not happen before A, or
- if A does not exist, the result of some store operation to M in the visible sequence of atomic store operations with respect to B that is not SEQ_CST.
For an atomic load operation B from a memory location M, if there is a SEQ_CST fence X sequenced before B, then B observes either the last SEQ_CST store operation of M preceding X in the total order S or a later store operation of M in its modification order.
For atomic operations A and B on a memory location M, where A stores into M and B loads from M, if there is a SEQ_CST fence X such that A is sequenced before X and B follows X in S, then B observes either the effect of A or a later store operation of M in its modification order.
For atomic operations A and B on a memory location M, where A stores into M and B loads from M, if there are SEQ_CST fences X and Y such that A is sequenced before X, Y is sequenced before B and X precedes Y in S, then B observes either the effect of A or a later store operation of M in its modification order.
Atomic read-modify-write operations shall always see the last value (in the modification order) stored before the store operation associated with the read-modify-write operation.
A swap-stack operation performs an unbinding operation followed by a binding operation. The former is sequenced before the latter.
In the evaluation of a TRAP
or WATCHPOINT
instruction, the implied stack
unbinding operation is sequenced before any operations performed by the client.
If the client chooses to return and rebind the stack, the stack binding
operation is sequenced after all operations performed by the client and the
implied stack unbinding operation.
Stack binding and unbinding operations are not atomic. If there is a pair of stack binding or unbinding operations on the same stack, but do not have a "happens before" relation, it has undefined behaviour.
The load operations performed by the @uvm.futex.wait
,
@uvm.futex.wait_timeout
and @uvm.futex.cmp_requeue
on the memory
location given by its argument are atomic.
The rules of memory access applies to functions as if
- a function were a memory location that holds a function version, and
- a creation of a frame for a function were an atomic load on that location of the RELAXED order, which sees a particular version, and
- a function definition or redefinition during the load of a bundle were an atomic store on that location of the RELAXED order, which stores a new version.
NOTE: A frame is created when:
- calling a function by the
CALL
orTAILCALL
instructions, or by native programs through exposed Mu functions, or- creating a new stack by the
@uvm.new_stack
instruction or thenew_stack
API, or- pushing a new frame by the
push_frame
API or the@uvm.meta.push_frame
instruction.
The order of definitions and redefinitions of a particular function is consistent with the order the bundles that contain the definitions are loaded.
NOTE: This means synchronisation operations must be used to guarantee other threads other than the one which loads a bundle see the most recent version of a function.
TODO