Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the behavior of the CLR when unsafe casts are used #6176

Closed
GSPP opened this issue Jun 18, 2016 · 12 comments
Closed

Document the behavior of the CLR when unsafe casts are used #6176

GSPP opened this issue Jun 18, 2016 · 12 comments
Labels
design-discussion Ongoing discussion about design without consensus documentation Documentation bug or enhancement, does not impact product or test code
Milestone

Comments

@GSPP
Copy link

GSPP commented Jun 18, 2016

In the System.Runtime.CompilerServices.Unsafe package there is a method that allows unsafe casts from anything to anything:

  .method public hidebysig static !!T As<class T>(object o) cil managed aggressiveinlining
  {
        .custom instance void System.Runtime.Versioning.NonVersionableAttribute::.ctor() = ( 01 00 00 00 )
        .maxstack 1
        ldarg.0
        ret
  } // end of method Unsafe::As

This means that string s = F(); s.GetType(); can be SomeClass or string s = F(); (string)s can throw. In fact it might not throw because the C# compiler or the JIT might optimize that cast out.

Please document the usage rules for unsafe casts in general and document what exact "unsafe" operations are safe to perform on the CLR.

Note, that As<T> can be implemented right now using an unsafe union type e.g.

struct CastHelper<T1, T2> where T1, T2 : class
{ [FieldOffset(0)] T1 T1; [FieldOffset(0)] T2 T2; }

For that reason the question is relevant right now, even without the System.Runtime.CompilerServices.Unsafe package.

It should be documented as well what the CLR thinks about such unsafe structs. In particular I wonder what the optimizing JIT and the GC do with object references that point to an unexpected type.

@jkotas
Copy link
Member

jkotas commented Jun 19, 2016

These unsafe behaviors are not guaranteed to be the same accross runtimes. My description applies to CLR/CoreCLR.

GC does not care about the exact types, e.g. if type of local object reference variable is not compatible with what is actually stored in it, the GC will still track it fine. The lowest level constructs (load/store fields, call non-virtual non-generic method) work ok too - of course, you have to know what you are doing.

Most of the higher level constructs (virtual methods, etc.) do depend on exact types and can get confused:

  • Do not call virtual methods on the object references with mismatched types, or perform other similar operations that consult the actual type
  • Do not pass them to any framework or 3rd party libraries, because of they may do the above on your behalf

@buybackoff
Copy link

@jkotas Could you please confirm that the following example is "safe" in terms of GC stability, i.e. unsafe casting of an array KVP<DateTme,Decimal> to byte[] and then fixing the pointer to &bytes[0] and using it in unsafe context is "safe in unsafe context", that is it will not corrupt runtime.

In the code below, asBytes.Length is 2 and during debug it is still shown as KeyValuePair<DateTime, decimal>[], but then fixing and unsafe access works as expected. Is this correct usage of the new package? Any previous hacks did not succeed without this package.

using System.Runtime.CompilerServices.Unsafe;
[Test]
public unsafe void CouldUseNewUnsafePackage() {
    var dt = new KeyValuePair<DateTime, decimal>[2];
    dt[0] = new KeyValuePair<DateTime, decimal>(DateTime.UtcNow.Date, 123.456M);
    dt[1] = new KeyValuePair<DateTime, decimal>(DateTime.UtcNow.Date.AddDays(1), 789.101M);
    var obj = (object)dt;
    byte[] asBytes = Unsafe.As<byte[]>(obj);
    //Console.WriteLine(asBytes.Length); // prints 2
    fixed (byte* ptr = &asBytes[0]) {
        // reading this: https://github.com/dotnet/coreclr/issues/5870
        // it looks like we could fix byte[] and actually KeyValuePair<DateTime, decimal>[] will be fixed
        // because:
        // "GC does not care about the exact types, e.g. if type of local object 
        // reference variable is not compatible with what is actually stored in it, 
        // the GC will still track it fine."
        for (int i = 0; i < (8 + 16) * 2; i++) {
            Console.WriteLine(*(ptr + i));
        }
        var firstDate = *(DateTime*)ptr;
        Assert.AreEqual(DateTime.UtcNow.Date, firstDate);
        Console.WriteLine(firstDate);
        var firstDecimal = *(decimal*)(ptr + 8);
        Assert.AreEqual(123.456M, firstDecimal);
        Console.WriteLine(firstDecimal);
        var secondDate = *(DateTime*)(ptr + 8 + 16);
        Assert.AreEqual(DateTime.UtcNow.Date.AddDays(1), secondDate);
        Console.WriteLine(secondDate);
        var secondDecimal = *(decimal*)(ptr + 8 + 16 + 8);
        Assert.AreEqual(789.101M, secondDecimal);
        Console.WriteLine(secondDecimal);
    }
}

@jkotas
Copy link
Member

jkotas commented Aug 16, 2016

It will work in CoreCLR today, but unsafe casting of KeyValuePair[] to byte[] is a bit questionable. There is no guarantee that KeyValuePair[] and byte[] will have same in-memory layout. Some runtimes may insert different amount of padding before the start of the first element based on the array element type.

A bit more robust way would be to do the unsafe cast on the address of the first array element. It requires current preview version of C# compiler (https://github.com/dotnet/roslyn/) and the unsafe package (https://github.com/dotnet/corefx/tree/master/src/System.Runtime.CompilerServices.Unsafe).

var dt = new KeyValuePair<DateTime, decimal>[2];
ref byte asRefByte = ref Unsafe.As<KeyValuePair<DateTime, decimal>, byte>(ref dt[0]);
fixed (byte * ptr = &asRefByte)
{


}

@GSPP
Copy link
Author

GSPP commented Aug 16, 2016

There should be more formal documentation that can be used to clearly decide whether any given piece of code is safe or not. @jkotas your comments are helpful but not sufficient to decide on safety. For example, could the JIT optimize the following assuming that a and b do not alias?

class A { int X; }
class B { int X; }

var a = GetSomeA();
var b = GetSomeB();
a.X = 1;
b.X = 2; //Is b a different object? Who knows...

Console.WriteLine(a.X); //Prints: RefEquals(a, b) ? 2 : 1

If a and b can alias but the JIT assumes them to not alias based on type it might optimize this code to Console.WriteLine(1)!

There needs to be more formal and complete documentation on what effects unsafe operations can cause.

@jkotas
Copy link
Member

jkotas commented Aug 16, 2016

I do not believe that he JIT makes assumptions about no-aliasing today. There are number of variants of it, e.g.:

void foo(ref int x, ref float y)
{
    x = 1;
    y = 2.0;
    if (x == 1) // Can JIT assume that x could not have been modified by the above assignment of float?
    {
       ...
    }
}

I agree it would be nice to have more precise specification for valid unveriable code - that includes unsafe casts. There is some specification for it in ECMA-335, but it is not as precise as the specification for verifiable code. Ideally it should be added to ECMA-335 spec.

cc @dotnet/jit-contrib

@JosephTremoulet
Copy link
Contributor

Actually, if I compile this code with the current JIT:

    class A { public int x; }

    class B { public int y; }

    ...
        static int DoIt(A a, B b) {
            int x = a.x;
            b.y = 2;
            return x + a.x;
        }

then the codegen for DoIt does indeed assume that a.x and b.y don't alias:

; Assembly listing for method N.Foo:DoIt(ref,ref):int
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,   3  )     ref  ->  rcx
;  V01 arg1         [V01,T01] (  3,   3  )     ref  ->  rdx
;  V02 tmp0         [V02,T02] (  2,   4  )     int  ->  rcx
;# V03 OutArgs      [V03    ] (  1,   1  )  lclBlk ( 0) [rsp+0x00]
;  V04 cse0         [V04,T03] (  3,   3  )     int  ->  rax
;
; Lcl frame size = 0

G_M49388_IG01:

G_M49388_IG02:
       8B4108               mov      eax, dword ptr [rcx+8]
       8BC8                 mov      ecx, eax
       C7420802000000       mov      dword ptr [rdx+8], 2
       03C1                 add      eax, ecx

G_M49388_IG03:
       C3                   ret

; Total bytes of code 15, prolog size 0 for method N.Foo:DoIt(ref,ref):int
; ============================================================

It looks like the assumption the JIT makes to acheive this is that different class fields (i.e. with different field handles) will not alias each other so long as CORINFO_FLG_OVERLAPPING_FIELDS is not set for them.

As a more general rule, I'm used to trusting that fields/locals/args/elements whose type is a reference type point to either null or an instance of the class type indicated, but trusting basically nothing about byrefs or unmanaged pointers. But that's my mental model and is informed by other projects, I'm not sure to what extent the current JIT code base actually makes use of such an assumption (aside from the case mentioned above).

@GSPP
Copy link
Author

GSPP commented Aug 16, 2016

Looks like even BCL-internal code is treading on glass using these unsafe primitives.

And it would be a shame to lose optimizations based on aliasing. Also, even if a given optimization is not performed right now we probably don't want to commit ourselves to never perform it for compat reasons. Much stronger optimizers (e.g. LLVM/LLILC) are on the horizon.

@CarolEidt
Copy link
Contributor

I think it would require some serious analysis of JIT behavior and the current specification, plus some language-lawyering, to come up with the right specification for how the JIT should behave with regard to these types of unsafe casts. However, I think that it is imperative that we retain the following invariants, from the JIT's perspective:

  • Type T is never referenced as type U, unless T derives from U.
  • A field reference T.f is never used to reference a field U.g, where f and g are different fields.

A short list, not at all exhaustive list of implications for developers when using unsafe casts:

  • Ensure that any unsafe references do not escape the method, OR that the references are mutually exclusive (i.e. they do not overlap) and are held only by the current method (e.g. it was allocated by the current method).
  • If code violates type safety for managed types (including value types, but excluding unsafe pointers to native memory) it should be marked [MethodImpl(MethodImplOptions.NoOptimization)]

@DemiMarie
Copy link

@CarolEldt So I assume that unsafe downcasts of discriminated union types (after having already verified that the cast is safe) are okay?

@CarolEidt
Copy link
Contributor

@DemiMarie - Yes, at least from the JIT's perspective. Any value types that have overlapping fields are treated conservatively by the JIT.

@danmoseley
Copy link
Member

@CarolEidt would it make sense to close this now?

@CarolEidt
Copy link
Contributor

@danmosemsft - I'm not sure what is the correct way to handle this kind of issue. Ideally, it would be reassigned as a spec issue, and handled there, but we do not currently have an active standard working group, nor a mechanism for filing and tracking these kinds of issues. But, yes, I think this issue should be closed. I've made a note of it, for consideration in future spec work.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
design-discussion Ongoing discussion about design without consensus documentation Documentation bug or enhancement, does not impact product or test code
Projects
None yet
Development

No branches or pull requests

8 participants