Skip to content
This repository has been archived by the owner on Aug 2, 2019. It is now read-only.

Latest commit

 

History

History
595 lines (452 loc) · 24.4 KB

memory-model.rest

File metadata and controls

595 lines (452 loc) · 24.4 KB

Memory Model

The Mu memory model basically follows the C11 memory model with a few modifications to make it suitable for Mu.

Overview

Mu does not enforce any strong order, but trusts the client to correctly use the atomic and ordering mechanisms provided by Mu. Many choices of ordering, from relaxed to sequentially consistent, on each memory operation are given to the client. The client has the freedom to make its choice and has the responsibility to synchronise their multi-threaded programs.

The most restricted form of memory accesses are sequentially consistent. Mu guarantees that they are atomic. They follow the well-known acquire-release memory model and there is a total order of all such memory accesses in all threads.

A less-restricted form is acquire and release. It does not have the total order provided by sequential consistency, but atomicity and the synchronise-with relationship between release and acquire operations are provided.

The "consume" order exploits the fact that some processors with relaxed memory order can figure out the dependencies between load operations in the hardware and will not reorder them even without memory fences. It can achieve some synchronisation requirements more efficiently than the "acquire" order.

The "relaxed" order only guarantees atomicity but does not enforce any order. The most unrestricted form of memory access is not atomic. These operations allows the Mu implementation and the processor to maximise the throughput while relying on the programmer to correctly synchronise their programs.

Notable differences from C11

The program order in Mu is a total order while the sequenced-before relationship in C is a partial order, because there are unspecified order of evaluations in C.

There is no "atomic type" in Mu. Only operations make a difference between atomic and non-atomic accesses. Using both atomic and non-atomic operations on the same memory location is an undefined behaviour in Mu.

The primitives for atomic accesses and fences are provided by the instruction set of Mu rather than the library. Mutex locks, however, have to be implemented on top of this memory model.

Notable differences from LLVM

LLVM does not guarantee that all atomic writes to a memory location has a total order unless using monotonic or stronger memory order. It provides an unordered order which is atomic but not "monotonic". The unordered order is intended to support the Java memory model, but whether unordered is necessary and whether the relaxed order in C11 or the monotonic in LLVM is suitable for Java is not yet known.

In LLVM, an atomic operation can be labelled singlethread, in which case it only synchronises with or participates in modification and seq_cst total orderings with other operations running in the same thread (for example, in signal handlers). C11 provides atomic_signal_fence for similar purposes.

Concepts

data value
See Type System
SSA variable, instruction and evaluation
See Instruction Set
memory, initial value, load, store, access and conflict
See Mu and the Memory
thread
A thread is the unit of CPU scheduling. In this memory model, threads include but are not limited to Mu threads. See Threads and Stacks for the definition of Mu threads.
stack, stack binding, stack unbinding, swap-stack
See Threads and Stacks
futex, futex_wait, futex_wake
See Threads and Stacks

Comparison of Terminology

The following table is a approximate comparison and may not strictly apply.

C Mu
value data value
expression SSA variable
object memory location
memory location memory location of scalar type
(N/A) object
read load
modify store

Operations

Operations include (but are not limited to) the following:

load
A memory load. May be atomic or not.
store
A memory store. May be atomic or not.
atomic read-modify-write
A load and (maybe conditionally) a store as one atomic action. It may contain both a load and a store operation, but may have special atomic properties.
fence
A fence introduces memory orders.
stack binding
Binding a thread to a stack.
stack unbinding
Unbinding a thread from a stack.
swap-stack
Unbinding a thread from a stack and bind that thread to another stack.
futex wait
Waiting on a memory location.
futex wake
Wake up threads waiting on a memory location.
external operation
Any other operation that may affect the state outside Mu.
NOTE: Unlike the Java Memory Model, Mu memory model does not contain locks.

Memory Operations

Some instructions and API functions perform memory operations. Specifically,

  • The LOAD instruction and the load API function perform a load operation.
  • The STORE instruction and the store API function perform a store operation.
  • The CMPXCHG instruction and the cmpxchg API function perform a compare-exchange operation, which is a kind of atomic read-modify-write operation.
  • The ATOMICRMW instruction and the atomicrmw API function perform an atomic read-modify-write operation.
  • The FENCE instruction and the fence API function are a fence.
  • A concrete implementation may have other ways to perform those instructions.

NOTE: Programs in other languages (e.g. native programs or any other language a Mu implementation can interface with) can synchronise with Mu in an implementation-specific way. But the implementation must guarantee that those programs perform those operations in a way compatible with Mu.

For example, there are more than one way to implement loads and stores of the SEQ_CST order (either put fences in the load or in the store). If the implementation interfaces with a C implementation (e.g. gcc+glibc+Linux+x86_64), then Mu should do the same thing as (or be compatible with) the C program.

Load, store, atomic read-modify-write operations and fences have memory orders, which are the following:

  • NOT_ATOMIC
  • RELAXED
  • CONSUME
  • ACQUIRE
  • RELEASE
  • ACQ_REL (acquire and release)
  • SEQ_CST (sequentially consistent)

All accesses that are not NOT_ATOMIC are atomic. Using both non-atomic operations and atomic operations on the same memory location is an undefined behaviour.

  • Load shall have NOT_ATOMIC, RELAXED, CONSUME, ACQUIRE or SEQ_CST order.
  • Store shall have NOT_ATOMIC, RELAXED, RELEASE or SEQ_CST order.
  • Compare-exchange shall have RELAXED, ACQUIRE, RELEASE, ACQ_REL or SEQ_CST on success and RELAXED, ACQUIRE or SEQ_CST on failure.
  • Other atomic read-modify-write operations shall have RELAXED, ACQUIRE, RELEASE, ACQ_REL or SEQ_CST order.
  • Fence shall have ACQUIRE, RELEASE, ACQ_REL or SEQ_CST order.
Order LOAD STORE CMPXCHG(succ) CMPXCHG(fail) ATOMICRMW FENCE
NOT_ATOMIC yes yes no no no no
RELAXED yes yes yes yes yes no
CONSUME yes no no no no no
ACQUIRE yes no yes yes yes yes
RELEASE no yes yes no yes yes
ACQ_REL no no yes no yes yes
SEQ_CST yes yes yes yes yes yes
  • A load operation with ACQUIRE, ACQ_REL or SEQ_CST order performs a acquire operation on its specified memory location.
  • A load operation with CONSUME order performs a consume operation on its specified memory location.
  • A store operation with RELEASE, ACQ_REL or SEQ_CST order performs a release operation on its specified memory location.
  • A fence with ACQUIRE, ACQ_REL or SEQ_CST order is a acquire fence.
  • A fence with RELEASE, ACQ_REL or SEQ_CST order is a release fence.

Orders

Program Order

All evaluations performed by a Mu thread form a total order, in which the operations performed by each evaluation are sequenced before operations performed by its successor.

All operations performed by a Mu client via a particular client context of the API form a total order, in which each operation is sequenced before its successor.

Operations before a TRAP or WATCHPOINT are sequenced before the operations in the trap handler. Operations in the trap handler are sequenced before operations after that TRAP or WATCHPOINT.

The program order contains operations and their "sequenced before" relations.

NOTE: This means all Mu instructions plus all client operations done by the trap handler in a Mu thread still forms a total order.

In C, the program order is a partial order even in a single thread because of unspecified order of evaluations.

Modification Order

All atomic store operations on a particular memory location M occur in some particular total order, called the modification order of M. If A and B are atomic stores on memory location M, and A happens before B, then A shall precede B in the modification order of M.

NOTE: This is to say, the modification order is consistent with the happens before order.

NOTE: This reflects the mechanisms, including cache coherence, provided by some hardware that guarantees such a total order.

A release sequence headed by a release operation A on a memory location M is a maximal contiguous sub-sequence of atomic store operations in the modification order M, where the first operation is A and every subsequent operation either is performed by the same thread that performed the release or is an atomic read-modify-write operation.

NOTE: In Mu, when a memory location is accessed by both atomic and non-atomic operations, it is an undefined behaviour. So the release sequence only apply for memory locations only accessed by atomic operations.

NOTE: Intuitively, there is a invisible fence before a release store (which is sometimes actually implemented as this). Seeing a store in the release sequence should imply seeing stores before the invisible fence.

The Synchronises With Relation

An evaluation A synchronises with another evaluation B if:

  • A performs a release operation on memory location M, and, B performs an acquire operation on M, and, sees a value stored by an operation in the release sequence headed by A, or
  • A is a release fence, and, B is an acquire fence, and, there exist atomic operations X and Y, both operating on some memory location M, such that A is sequenced before X, X store into M, Y is sequenced before B, and Y sees the value written by X or a value written by any store operation in the hypothetical release sequence X would head if it were a release operation, or
  • A is a release fence, and, B is an atomic operation that performs an acquire operation on a memory location M, and, there exists an atomic operation X such that A is sequenced before X, X stores into M, and B sees the value written by X or a value written by any store operations in the hypothetical release sequence X would head if it were a release operation, or
  • A is an atomic operation that performs a release operation on M, and, B is an acquire fence, and, there exists some atomic operation X on M such that X is sequenced before B and sees the value written by A or a value written by any side effect in the release sequence headed by A, or
  • A is the creation of a thread and B is the beginning of the execution of the new thread.
  • A is a futex wake operation and B is the next operation after the futex wait operation of the thread woken up by A.

NOTE: A thread can be created by the NEWTHREAD instruction or the new_thread API function.

NOTE: Since there is no explicit heap memory management in Mu, the "synchronises with" relation in C involving free and realloc does not apply in Mu.

NOTE: Mu only provides very primitive threading support. The "synchronises with" relations involving call_once and thrd_join are not in the memory model, but can be implemented on a higher level.

NOTE: The "synchronises with" relation between the futex wake and wait is necessary to ensure the visibility of values written by one thread to be visible immediately by the woken thread. If such relation does not exist, the woken thread may never see the memory change made by the other thread. For example:

// C pseudo code
int        shared_var = 42;
atomic_int futex      = 0;

thread1 {
    shared_var = 43;
    futex = 1;              // Op1
    futex_wake(&futex);     // Op2
}

thread2 {
    while(futex == 0) {         // Op4
        futex_wait(&futex, 0);  // Op3
    }
    int local_var = shared_var;
}

If the "synchronises with" between Op2 and Op3 does not exist, then Op4 may never see the value written by Op1, and thread2 will loop indefinitely.

Dependency

An evaluation A carries a dependency to another evaluation B, or B carries a dependency from A, if:

  • the data value of A is used as a data argument of B unless:
    • A is used in the KEEPALIVE clause of B, or
    • B is a SELECT instruction and A is its cond argument or a is the iftrue or iffalse argument not selected by cond, or
    • A is a comparing or INSERTVALUE instruction, or
    • B is a @uvm.kill_dependency, CALL, EXTRACTVALUE or CCALL instruction, or
  • there is a store operation X such that A is sequenced before X and X is sequenced before B, and, X stores the value of A to a memory location M, and, B performs a load operation from M, or
  • for some evaluation X, A carries a dependency to X and X carries a dependency to B.

NOTE: The "carries a dependency to" relation together with the "dependency-ordered before"" relation exploits the fact that some processors, notably ARM and POWER, will not reorder load operations if the address used in the later in the program order depends on the result of the earlier load. On such processors, the earlier load can be implemented as an ordinary load without fences and still has "consume" semantic.

NOTE: Processors including ARM and POWER only respects data dependency, not control dependency. The SELECT instruction and the comparing instruction are usually implemented by conditional moves or conditional flags, which would end up that the result is control-dependent on the argument rather than data dependent.

NOTE: Operations involving struct types in Mu may be implemented as no-ops. Consider the following:

.typedef @i64 = int<64>
.const @I64_0 <@i64> = 0

.type @A = struct <@i64 @i64>
.const @A_ZERO <@A> = {@I64_0 @I64_0}

%v = LOAD CONSUME <@i64> %some_memory_location
%x = INSERTVALUE  <@A 0> @A_ZERO %v     // {%v 0}
%y = EXTRACTVALUE <@A 0> %x             // %v
%z = EXTRACTVALUE <@A 1> %x             // 0

Mu can alias %y with %v in the machine code, but %z is always a constant zero.

NOTE: Dependencies may not always be carried across function calls. A function may return a constant and it is uncertain if any processor respect this order.

An evaluation A is dependency-ordered before another evaluation B if any of the following is true:

  • A performs a release operation on a memory location M, and, in another thread, B performs a consume operation on M and sees a value stored by any store operations in the release sequence headed by A.
  • For some evaluation X, A is dependency-ordered before X and X carries a dependency to B.
NOTE: The "dependency-ordered before" relation consists of a release/consume pair followed by zero or more "carries a dependency to" relations. If the consume sees the value of (or "later than") the release operation, then subsequent loads that depends on the consume operation should also see values stored before the release operation.
TODO: The "carries a dependency to" relation is not well-defined for the client since it may be written in a different language.

The Happens Before Relation

An evaluation A inter-thread happens before an evaluation B if A synchronises with B, A is dependency-ordered before B, or, for some evaluation X:

  • A synchronises with X and X is sequenced before B,
  • A is sequenced before X and X inter-thread happens before B, or
  • A inter-thread happens before X and X inter-thread happens before B.

NOTE: This basically allows any concatenations of "synchronises with", "dependency-ordered before" and "sequenced before" relations, but disallows ending with a "dependency-ordered before" relation followed by a "sequenced before" relation. It is disallowed because the consume load in the "dependency-ordered before" relation only respects later loads that works with a location that depends on the consume load, not arbitrary loads sequenced after it. It is only disallowed in the end because the release operation in a "synchronises with" relation or a "dependency-ordered before" relation will force the order between it and any preceding operations.

NOTE: A sequence of purely "sequenced before" is not "inter-thread" and is also not allowed in the "inter-thread happens before" relation.

An evaluation A happens before an evaluation B if A is sequenced before B or A inter-thread happens before B.

Value Visibility

A load operation B from a memory location M shall see the initial value of M, the value stored by a store operation A sequenced before B, or other permitted values defined later.

A visible store operation A to a memory location M with respect to a load operation B from M satisfies the conditions:

  • A happens before B, and
  • there is no other store operation X to M such that A happens before X and X happens before B.

A non-atomic load operation B from memory location M shall see the value stored by the visible store operation A.

NOTE: If there is ambiguity about which store operation is visible to a non-atomic load operation, then there is a data race and the behaviour is undefined.

The visible seqeunce of atomic store operations to a memory location M with respect to an atomic load operation B from M, is a maximal contiguous sub-sequence of atomic store operations in the modification order of M, where the first operation is visible with respect to B, and for every subsequent operation, it is not the case that B happens before it. The atomic load operation B sees the value stored by some atomic load operation in the visible sequence M. Furthermore, if an atomic load operation A from memory location M happens before an atomic load operation B from M, and A sees a value stored by an atomic store operation X, then the value B sees shall either equal the value seen by A, or be the value stored by an atomic store operation Y, where Y follows X in the modification order of M.

NOTE: This means, a load cannot see the value stored by an operation happens after it, or a store operation separated by another store in the happen-before relation. Furthermore, the later operation of two loads cannot see an earlier value than that seen by the first load.

The execution of a program contains a data race if it contains two conflicting non-atomic memory accesses in different threads, neither happens before the other. Any such data race results in undefined behaviour.

NOTE: Using both atomic and non-atomic accesses on the same memory location is already an undefined behaviour, whether in the same thread of not.

Special Rules for SEQ_CST

There shall be a single total order S on all SEQ_CST operations, consistent with the "happens before" order and modification orders for all affected memory locations, such that each SEQ_CST load operation B from memory location M sees one of the following values:

  • the result of the last store operation A that precedes B in S, if it exists, or
  • if A exists, the result of some store operation to M in the visible sequence of atomic store operations with respect to B that is not SEQ_CST and does not happen before A, or
  • if A does not exist, the result of some store operation to M in the visible sequence of atomic store operations with respect to B that is not SEQ_CST.

For an atomic load operation B from a memory location M, if there is a SEQ_CST fence X sequenced before B, then B observes either the last SEQ_CST store operation of M preceding X in the total order S or a later store operation of M in its modification order.

For atomic operations A and B on a memory location M, where A stores into M and B loads from M, if there is a SEQ_CST fence X such that A is sequenced before X and B follows X in S, then B observes either the effect of A or a later store operation of M in its modification order.

For atomic operations A and B on a memory location M, where A stores into M and B loads from M, if there are SEQ_CST fences X and Y such that A is sequenced before X, Y is sequenced before B and X precedes Y in S, then B observes either the effect of A or a later store operation of M in its modification order.

Special Rules for Atomic Read-modify-write Operations

Atomic read-modify-write operations shall always see the last value (in the modification order) stored before the store operation associated with the read-modify-write operation.

Special Rules for Stack Operations

A swap-stack operation performs an unbinding operation followed by a binding operation. The former is sequenced before the latter.

In the evaluation of a TRAP or WATCHPOINT instruction, the implied stack unbinding operation is sequenced before any operations performed by the client. If the client chooses to return and rebind the stack, the stack binding operation is sequenced after all operations performed by the client and the implied stack unbinding operation.

Stack binding and unbinding operations are not atomic. If there is a pair of stack binding or unbinding operations on the same stack, but do not have a "happens before" relation, it has undefined behaviour.

Special Rules for Futex

The load operations performed by the @uvm.futex.wait, @uvm.futex.wait_timeout and @uvm.futex.cmp_requeue on the memory location given by its argument are atomic.

Special Rules for Functions and Function Redefinition

The rules of memory access applies to functions as if

  • a function were a memory location that holds a function version, and
  • a creation of a frame for a function were an atomic load on that location of the RELAXED order, which sees a particular version, and
  • a function definition or redefinition during the load of a bundle were an atomic store on that location of the RELAXED order, which stores a new version.

NOTE: A frame is created when:

  1. calling a function by the CALL or TAILCALL instructions, or by native programs through exposed Mu functions, or
  2. creating a new stack by the @uvm.new_stack instruction or the new_stack API, or
  3. pushing a new frame by the push_frame API or the @uvm.meta.push_frame instruction.

The order of definitions and redefinitions of a particular function is consistent with the order the bundles that contain the definitions are loaded.

NOTE: This means synchronisation operations must be used to guarantee other threads other than the one which loads a bundle see the most recent version of a function.

Out-of-thin-air or Speculative stores

TODO