Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory model document. #75790

Merged
merged 33 commits into from
Dec 16, 2022
Merged
Changes from 7 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
b1dc5ee
Created memory-model.md
VSadov Jun 16, 2022
1fa2ecc
addresses some comments
VSadov Jun 18, 2022
c4b4765
More details and samples.
VSadov Jun 18, 2022
b872aba
Fix trailing whitespaces.
VSadov Sep 17, 2022
aae251b
More trailing whitespace
VSadov Sep 17, 2022
f13605d
More trailing whitespace.
VSadov Sep 17, 2022
a8a1003
Apply suggestions from code review (typos)
VSadov Sep 17, 2022
b15a921
replaced references to CLR with ".NET runtime"
VSadov Sep 21, 2022
d554b05
Addressed some PR review feedback
VSadov Sep 22, 2022
38f0585
Moved to specs folder
VSadov Sep 22, 2022
fe0a65e
More addressing PR feedback
VSadov Sep 22, 2022
e8861ec
Volatile/Interlocked methods are atomic
VSadov Sep 22, 2022
5a25cab
Better notes about atomicity of pointers
VSadov Sep 22, 2022
be72e2e
Apply suggestions from code review
VSadov Sep 22, 2022
75adccb
Addressing more PR feedback
VSadov Sep 23, 2022
eec3b96
Updated singleton sample for more clarity.
VSadov Sep 23, 2022
aaeadfc
Trailing whitespace.
VSadov Sep 23, 2022
4cdac19
Move data dependent reads to general section
VSadov Sep 26, 2022
05694e3
Compat disambiguation note on object assignments.
VSadov Sep 26, 2022
804942f
Apply suggestions from code review
VSadov Sep 26, 2022
f1e13ed
No dots at title ends
VSadov Sep 26, 2022
d7c179c
"Data-dependent" spelled with dash consistently
VSadov Sep 26, 2022
615c434
Apply suggestions from code review
VSadov Sep 27, 2022
20b6b52
Apply suggestions from code review
VSadov Oct 1, 2022
5579343
Apply suggestions from code review
VSadov Nov 3, 2022
31431ed
order of object assignment and data-dependent memory accesses
VSadov Nov 3, 2022
8b23a88
Listed primitive types.
VSadov Dec 16, 2022
e02a5df
Briefly explained motivations for the treatment of memory access side…
VSadov Dec 16, 2022
280014d
Update docs/design/specs/Memory-model.md
VSadov Dec 16, 2022
608f45d
Link to the data-dependent accesses and compiler optimizations follow…
VSadov Dec 16, 2022
ed89b04
removed unnecessary `[`
VSadov Dec 16, 2022
136e637
Update docs/design/specs/Memory-model.md
jkotas Dec 16, 2022
34a074d
Trailing whitespace
VSadov Dec 16, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
248 changes: 248 additions & 0 deletions docs/design/coreclr/botr/memory-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,248 @@

# CLR memory model

## ECMA vs. CLR memory models.
ECMA 335 standard defines a very weak memory model. After two decades the desire to have a flexible model did not result in considerable benefits due to hardware being more strict. On the other hand programming against ECMA model requires extra complexity to handle scenarios that are hard to comprehend and not possible to test.

In the course of multiple releases CLR implementation settled around a memory model that is a practical compromise between what can be implemented efficiently on the current hardware, while staying reasonably approachable by the developers. This document rationalizes the invariants provided and expected by the CLR runtime in its current implementation with expectation of that being carried to future releases.
VSadov marked this conversation as resolved.
Show resolved Hide resolved

## Alignment
When managed by CLR runtime, variables of built-in primitive types are *properly aligned* according to the data type size. This applies to both heap and stack allocated memory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about members of structs?

Copy link
Member Author

@VSadov VSadov Sep 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fields are variables.

Regardless how you get something that has, for example, type int, it will be aligned accordingly. The runtime (i.e. JIT, GC, TypeSystem) will ensure alignment for all variables under runtime's control.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FieldOffsetAttribute could possibly introduce unaligned fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably, by using FieldOffsetAttribute you are not letting the runtime to manage locations of the fields.
I will add a note about FieldOffsetAttribute as something that may cause unaligned variables.


1-byte, 2-byte, 4-byte variables are stored at 1-byte, 2-byte, 4-byte boundary, respectively.
8-byte variables are 8-byte aligned on 64 bit platforms.
Native-sized integer types and pointers have alignment that matches their size on the given platform.

## Atomic memory accesses.
Memory accesses to *properly aligned* data of primitive types are always atomic. The value that is observed is always a result of complete read and write operations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want to mention that reading and storing object references and unmanaged pointers is atomic?

Copy link
Member Author

@VSadov VSadov Sep 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Object references are always aligned. We can mention that (if it is already not mentioned).
But unmanaged pointers are just numbers with dereference operation defined. They may not be aligned, then reading/storing is not necessarily atomic.

Copy link
Member Author

@VSadov VSadov Sep 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you mean the pointers themselves. When managed by the runtime the pointers and references are "properly aligned" and thus read/writes are atomic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the value of unmanaged pointer. Unmanaged pointer is not a primitive type, so the current wording does not cover it.

Copy link
Member Author

@VSadov VSadov Sep 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it may be worth mentioning this. It follows from the "properly aligned" but pointers/references is a very common scenario.


## Unmanaged memory access.
As unmanaged pointers can point to any addressable memory, operations with such pointers may violate guarantees provided by the runtime and expose undefined or platform-specific behavior.
**Example:** memory accesses through pointers which are *not properly aligned* may be not atomic or cause faults depending on the platform and hardware configuration.

Although rare, unaligned access is a realistic scenario and thus there is some limited support for unaligned memory accesses, such as:
* `.unaligned` IL prefix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unaligned.

* `Unsafe.ReadUnaligned`, `Unsafe.WriteUnaligned` and ` Unsafe.CopyBlockUnaligned` helpers.

These facilities ensure fault-free access to potentially unaligned locations, but do not ensure atomicity.

As of this writing there is no specific support for operating with incoherent memory, device memory or similar. Passing non-ordinary memory to the runtime by the means of pointer operations or native interop results in Undefined Behavior.

## Sideeffects and optimizations of memory accesses.
CLR assumes that the sideeffects of memory reads and writes include only changing and observing values at specified memory locations. This applies to all reads and writes - volatile or not. **This is different from ECMA model.**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this forbid memory-mapped files mapped at multiple virtual addresses?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime will assume that the memory is ordinary memory. If there is some additional semantics to the writes (as write to multimapped location will update other views), that is an Undefined Behavior. There can be no guarantees from the runtime on what will happen.

It may work as expected in simple cases, but the runtime would not help or prevent that.


As a consequence:
* Speculative writes are not allowed.
* Reads cannot be introduced.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are "Speculative writes are not allowed" and "Reads cannot be introduced" really consequences of the above assumption, or additional guarantees made by the runtime?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a design choice, but it follows from assuming that ordinary writes/reads can have observable sideeffects.

If you assume that a write may be visible from other threads (unless the location is known to be local to the current thread), then you can't replace

location = actualValue;

with

location = someSpeculativeValue;
if (speculative value was wrong)
{
   location = actualValue;
}

Memory models that require only singlethreaded correctness allow this (ex: ordinary writes in C++), which is often unexpected and may result in subtle bugs in multithreaded scenarios.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are basically saying that the jit cannot introduce a race, but is allowed to remove one?

Copy link
Member Author

@VSadov VSadov Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ordinary memory accesses - yes. A program cannot condition its correctness on a nondeterministic race, that is not guaranteed to ever happen, so we can remove races. Adding a race would be observable.

Copy link
Member Author

@VSadov VSadov Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there are rare scenarios when a program explicitly wants to observe races - like making multiple or repeated reads of the same location and checking if the value has changed.
In such case it needs to use Volatile.Read, or a proposed compiler fence (#75874)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reads cannot be introduced

Does this mean the following RyuJit behavior is a bug?

[MethodImpl(MethodImplOptions.NoInlining)]
private static int Problem(int* p)
{
    var b = *p == 1;
    var c = b;
    if (b)
    {
        JitUse(c);
        return 2;
    }

    return 1;
}

[MethodImpl(MethodImplOptions.NoInlining)]
public static void JitUse<T>(T arg) { }
IN000a: 000000 sub      rsp, 40

G_M51150_IG02:        ; offs=000004H, size=000DH, bbWeight=1    PerfScore 8.25, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, BB01 [0000], byref, isz

IN0001: 000004 xor      eax, eax
IN0002: 000006 cmp      dword ptr [rcx], 1 ; The original load
IN0003: 000009 sete     al
IN0004: 00000C cmp      dword ptr [rcx], 1 ; The duplicated load
IN0005: 00000F jne      SHORT G_M51150_IG05

G_M51150_IG03:        ; offs=000011H, size=000DH, bbWeight=0.50 PerfScore 1.75, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, BB02 [0001], byref

IN0006: 000011 mov      ecx, eax
IN0007: 000013 call     [RyuJitReproduction.Program:JitUse(bool)]
IN0008: 000019 mov      eax, 2

G_M51150_IG04:        ; offs=00001EH, size=0005H, bbWeight=0.50 PerfScore 0.62, epilog, nogc, extend

IN000b: 00001E add      rsp, 40
IN000c: 000022 ret      

G_M51150_IG05:        ; offs=000023H, size=0005H, bbWeight=0.50 PerfScore 0.12, gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, BB03 [0002], gcvars, byref

IN0009: 000023 mov      eax, 1

G_M51150_IG06:        ; offs=000028H, size=0005H, bbWeight=0.50 PerfScore 0.62, epilog, nogc, extend

IN000d: 000028 add      rsp, 40
IN000e: 00002C ret      

(This particular example needs DOTNET_JitNoCSE=1 to be reproduced)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If DOTNET_JitNoCSE is a supported scenario, then this would seem like a bug.
"b == c" in the source and compiler introduced a race. @AndyAyersMS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JitNoCSE was just to create an easy and small sample; without it the compiler would have folded the duplicated load back, of course, that's not guaranteed in the general case.

Opened #75916 to track this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should avoid duplicating reads like this.

* Unused reads can be elided.
* Adjacent nonvolatile reads from the same location can be coalesced.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are more aggressive than this today. If a program does ordinary reads of a field twice without any possibility of an intervening synchronization construct or aliasing write, the jit may do just one read. The reads to not have to be adjacent.

For example

   int a = x.f;
   int b = x.g;
   int c = x.f;

may be transformed to

   int a = x.f;
   int b = x.g;
   int c = a;

Copy link
Member Author

@VSadov VSadov Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but in a case of "without any possibility of an intervening synchronization construct or aliasing write" you can mentally move the reads around so that they are adjacent and then coalesce.

JIT reasons in different terms which are more convenient in the implementation, but I think in the end there is equivalency, and JIT conforms to the rules in this document. (modulo unintentional bugs, if there are any)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting #47261 here as that would seem to be one such bug.

Copy link
Member Author

@VSadov VSadov Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#47261 is actually ByDesign.
When the reads are unused, there can be no race. It is permitted to remove unused reads.

* Adjacent nonvolatile writes to the same location can be coalesced.

## Thread-local memory accesses.
It may be possible for an optimizing compiler to prove that some data is accessible only by a single thread. In such case it is permitted to perform further optimizations such as duplicating or removal of memory accesses.

## Cross-thread access to local variables.
- There is no type-safe mechanism for accessing locations on one thread’s stack from another thread.
- Accessing managed references located on the stack of a different thread by the means of unsafe code will result in Undefined Behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious -- what is an example of UB one can observe today when passing pointers to locals around threads?

Copy link
Member Author

@VSadov VSadov Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an obvious hazard form the other stack getting shorter and then longer and recycling the location for something completely different.
In addition, the runtime (JIT and GC primarily) makes assumptions that the stack locations are not shared. That allows important optimizations for single-threaded access to objects, which is a common case. For example assigning to local variables is not reported to GC (does not perform GC-write-barriers) because assigning to a non-shared location cannot root an object from an older generation, cannot expose it to other threads prematurely, etc...

As an example that does not need a lot of stress to fail. The following would typically crash on ARM64 in under 1 second:

internal unsafe class Program
{
  private static void Main(string[] args)
  {
    Task.Run(M1);
    while (ptrStatic == null) { Thread.Sleep(1);}

    ref object otherThreadsLocal = ref Unsafe.AsRef<object>(ptrStatic);
    
    for(; ; )
    {
      System.Console.WriteLine(otherThreadsLocal); // can see corrupted incomplete objects
    }
  }

  static void* ptrStatic;

  static void M1()
  {   
    object local1 = 0;
    ptrStatic = Unsafe.AsPointer(ref local1);

    for (long i = 0; i < long.MaxValue; i++)
    {
        local1 = i; 
    }
  }
}


## Order of memory operations.
* **Ordinary memory accesses**
The effects of ordinary reads and writes can be reordered as long as that preserves single-thread consistency. Such reordering can happen both due to code generation strategy of the compiler or due to weak memory ordering in the hardware.

* **Volatile reads** have "acquire semantics" - no read or write that is later in the program order may be speculatively executed ahead of a volatile read.
Operations with acquire semantics:
- IL load instructions with `.volatile` prefix when instruction supports such prefix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: volatile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention c# volatile keyword in here? Since most readers will commonly use that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically C# volatile belongs to C# spec, but it is common, so a few notes on that would be useful.

Adding "volatile" on a variable in C# is just a decoration that makes no difference to CLR, but it is a hint to C# compiler itself (and few other compilers) to emit reads and writes of that variable as volatile reads/writes - this is the part that CLR will honor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that would be great as an "aside" here, although it's not a CLR concept it is useful context

- `System.Threading.Volatile.Read`
- `System.Thread.VolatileRead`
- Acquiring a lock (`System.Threading.Monitor.Enter` or entering a synchronized method)

* **Volatile writes** have "release semantics" - the effects of a volatile write will not be observable before effects of all previous, in program order, reads and writes become observable.
Operations with release semantics:
- IL store instructions with `.volatile` prefix when such prefix is supported
- `System.Threading.Volatile.Write`
- `System.Thread.VolatileWrite`
- Releasing a lock (`System.Threading.Monitor.Exit` or leaving a synchronized method)

* **.volatile initblk** has "release semantics" - the effects of `.volatile initblk` will not be observable earlier than the effects of preceeding reads and writes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does any of this map to C# that can be described?

Copy link
Member Author

@VSadov VSadov Sep 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think C# emits these.


* **.volatile cpblk** combines ordering semantics of a volatile read and write with respect to the read and written memory locations.
- The writes performed by `.volatile cpblk` will not be observable earlier than the effects of preceeding reads and writes.
- No read or write that is later in the program order may be speculatively executed before the reads performed by `.volatile cpblk`
- `cpblk` may be implemented as a sequence of reads and writes. The granularity and mutual order of such reads and writes is unspecified.

Note that volatile semantics does not by itself imply that operation is atomic or has any effect on how soon the operation is committed to the coherent memory. It only specifies the order of effects when they eventually become observable.

`.volatile` and `.unaligned` IL prefixes can be combined where both are permitted.

It may be possible for an optimizing compiler to prove that some data is accessible only by a single thread. In such case it is permitted to omit volatile semantics when accessing such data.

* **Full-fence operations**
Full-fence operations have "full-fence semantics" - effects of reads and writes must be observable no later or no earlier than a full-fence operation according to their relative program order.
Operations with full-fence semantics:
- `System.Thread.MemoryBarrier`
- `System.Threading.Interlocked` methods

## Process-wide barrier
Process-wide barrier has full-fence semantics with an additional guarantee that each thread in the program effectively performs a full fence at arbitrary point synchronized with the process-wide barrier in such a way that effects of writes that precede both barriers are observable by memory operations that follow the barriers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe off-topic, but doesn't this also need compiler memory barriers to be effective?

e.g.

// Make sure that the compiler won't reorder the read with the above write by wrapping the read in no-inline method.
// RyuJIT won't reorder them today, but more advanced optimizers might. Regular Volatile.Read would be too big of
// a hammer because of it will result into memory barrier on ARM that we do not need here.
//
//
if (VolatileReadWithoutBarrier(ref _current) == entry)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Process-wide barrier is typically paired with a compiler barrier. A not-inlineable getter or a volatile read can work as such, but maybe it is time to have an official compiler barrier helper.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: #75874


The actual implementation may vary depending on the platform. For example interrupting the execution of every core in the current process' affinity mask could be a suitable implementation.

## Synchronized methods
Synchronized methods have the same memory access semantics as if a lock is acquired at an entrance to the method and released upon leaving the method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to reference the actual attribute


## Object assignment
Object assignment to a location potentially accessible by other threads is a release with respect to write operations to the instance’s fields and metadata.
Copy link
Member Author

@VSadov VSadov Sep 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danmoseley - here is somehting that could differ on Mono. Mono has a switch to turn this off. It is not the default behavior though, so I do not think it needs to be mentioned.

The motivation is to ensure that storing an object reference to shared memory acts as a "committing point" to all modifications that are reachable through the instance reference. It also guarantees that a freshly allocated instance is valid (i.e. method table and necessary flags are set) when other threads, including background GC threads are able to access the instance.
The reading thread does not need to perform an acquiring read before accessing the content of an instance since all supported platforms honor ordering of data-dependent reads.

However, the ordering sideeffects of reference assignment should not be used for general ordering purposes because:
- ordinary reference assignments are still treated as ordinary assignments and could be reordered by the compiler.
- an optimizing compiler can omit the release semantics if it can prove that the instance is not shared with other threads.

## Instance constructors
CLR does not specify any ordering effects to the instance constructors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment wherever CLR is used, is Mono relevant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Mono is the same here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be changing CLR reference to .Net, as discussed a few pages above. This doc tries to be applicable to all common implementations of .Net
(we will treat disagreements with the final form of this doc as bugs, if such disagreements are found)


## Static constructors
All side effects of static constructor execution must happen before accessing any member of the type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must or will?

Copy link
Member Author

@VSadov VSadov Sep 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More precisely "will become observable not later than effects of accessing any member of the type..." or something like that.
Probably needs some re-wording.

Easy way to see it as if there is a lock that is taken when static constructor runs and releasing the lock is a release. The lock also guarantees that static constructor runs only once.
The implementation is not always a lock though, sometimes static constructors run at JIT time, NativeAOT may not run some static constructors at all and just use precomputed results embedded in the binary. The ordering result is the same.


## Hardware considerations
Currently supported implementations of CLR and system libraries make a few expectations about the hardware memory model. These conditions are present on all supported platforms and transparently passed to the user of the runtime. The future supported platforms will likely support these too as the large body of preexisting software will make it burdensome to break common assumptions.

* Naturally aligned reads and writes with sizes up to the platform pointer size are atomic.
That applies even for locations targeted by overlapping aligned reads and writes of different sizes.
**Example:** a read of a 4-byte aligned int32 variable will yield a value that existed prior some write or after some write. It will never be a mix of before/after bytes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - "prior to" or "before"


* The memory is cache-coherent and writes to a single location will be seen by all cores in the same order (multicopy atomic).
**Example:** when the same location is updated with values in ascending order (like 1,2,3,4,...), no observer will see a descending sequence.

* It may be possible for a thread to see its own writes before they appear to other cores (store buffer forwarding), as long as the single-thread consistency is not violated.

* The memory managed by the runtime is ordinary memory (not device register file or the like) and the only sideeffects of memory operations are storing and reading of values.

* It is possible to implement release consistency memory model.
Either the platform defaults to release consistency or stronger (i.e. x64 is TSO, which is stronger), or provides means to implement release consistency via fencing operations.

* Memory ordering honors data dependency
**Example:** reading a field, will not use a cached value fetched from the location of the field prior obtaining a reference to the instance.
(Some versions of Alpha processors did not support this, most current architectures do)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this relevant today?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but people keep bringing this up as a concern. - "I heard that some CPUs do not honor data dependency" - Yes, some flavors of Alphas had that issue, possibly a mistake in the design, no, noone does that again.


## Examples and common patterns:
The following examples work correctly on all supported CLR implementations regardless of the target OS or architecture.

* Constructing an instance and sharing with another thread is safe and does not require explicit fences.

```cs

static MyClass obj;

// thread #1
void ThreadFunc1()
{
while (true)
{
obj = new MyClass();
}
}

// thread #2
void ThreadFunc1()
{
while (true)
{
obj = null;
}
}

// thread #3
void ThreadFunc2()
{
MyClass localObj = obj;
if (localObj != null)
{
// accessing members of the local object is safe because
// - reads cannot be introduced, thus localObj cannot be re-read and become null
// - publishing assignment to obj will not become visible earlier than write operations in the MyClass constructor
// - indirect accesses via an instance are dependent reads, thus we will see results of constructor's writes
System.Console.WriteLine(localObj.ToString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

localObj.ToString() might not be a good example, because .ToString() could possibly read static fields that were written by the constructor (without using volatile). Static field reads are not dependent reads, and do not have the ordering guarantee. Probably better to explicitly read some instance field here.

Copy link
Member Author

@VSadov VSadov Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example is expected to work correctly even if .ToString accesses string's static fields.
it would be highly inconvenient if it did not work as caller does not often know these details.

Static field reads are not dependent reads

The rule about static constructors above tells that the cctor runs before any member is accessed and it would be useless to guarantee that static constructor "runs before" accessing without implying that the access will see the complete results.

The mechanism will vary and often the results of the static constructors will be computed well in advance before the actual access (at JIT time or Load time).
As a last resort this will be a dependent read from the type-attached storage, and the type would be derived from the instance, so there is a dependency chain from the instance to the static data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean static constructor. Consider

sealed class C
{
    //There may only be one instance, so why not make every field static?
    static string name;

    private C()
    {
        name = "My Name";
    }

    public override string ToString()
    {
        return name;//should always return "My Name"
    }

    static readonly object lockObj = new object();
    static C c;
    
    //Thread 1 and 2
    public static void PrintSingleton()
    {
        if (c is null)
        {
            lock (lockObj)
            {
                c ??= new C();
            }
        }
        Console.WriteLine(c.ToString());
    }
}

There is no dependency between reading C.c in C.PrintSingleton() and reading C.name in C.ToString(). Both threads may not print "My Name".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is just a write to some shared memory from the constructor. It could be a static variable in another class or unmanaged memory. Constructor can do many things. It can also launch a thread that will write to statics or it can do c = this or lockObj = null.
Having a constructor with sideeffects other than constructing and initializing the instance would make the type a poor choice for the singleton pattern.

The sample here is not about that, but about guarantees provided by the runtime.
I will add a definition of a simple MyClass so there is no confusion about what constructor does.

}
}

```

* Singleton (using a lock)

```cs

private object _lock = new object();
Copy link
Contributor

@Tornhoof Tornhoof Sep 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private MyClass _inst;

public MyClass GetSingleton()
{
if (_inst == null)
{
lock (_lock)
{
// taking a lock is an acquire, the read of _inst will happen after taking the lock
// releasing a lock is a release, if another thread assigned _inst, the write will be observed no later than the release of the lock
// thus if another thread initialized the singleton, the current thread is guaranteed to see that here.

if (_inst == null)
{
_inst = new MyClass();
}
}
}

return _inst;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to use volatile read here to ensure side effects of the constructor, e.g. static field writes, are visible. LazyInitializer currently also uses volatile reads.

public static T EnsureInitialized<T>([NotNull] ref T? target, [NotNullIfNotNull(nameof(syncLock))] ref object? syncLock, Func<T> valueFactory) where T : class =>
Volatile.Read(ref target!) ?? EnsureInitializedCore(ref target, ref syncLock, valueFactory);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for volatile reads. Any data written directly or indirectly prior to the publishing of the instance will be visible via the instance. (see comments about the statics in the previous comment)

}

```


* Singleton (using an interlocked operation)

```cs
private MyClass _inst;

public MyClass GetSingleton()
{
MyClass localInst = _inst;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to use volatile read here. See above for reasons.


if (localInst == null)
{
// unlike the example with the lock, we may construct multiple instances
// only one will "win" and become a unique singleton object
Interlocked.CompareExchange(ref _inst, new MyClass(), null);

// since Interlocked.CompareExchange is a full fence,
// we cannot possibly read null or some other spurious instance that is not the singleton
localInst = _inst;
}

return localInst;
}
```

* Communicating with another thread by checking a flag.

```cs
internal class Program
{
static bool flag;

static void Main(string[] args)
{
Task.Run(() => flag = true);

// the repeated read will eventually see that the value of 'flag' has changed,
// but the read must be Volatile to ensure all reads are not coalesced
// into one read prior entering the while loop.
while (!Volatile.Read(ref flag))
{
}

System.Console.WriteLine("done");
}
}

```