Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabled ByReference<T> usage in ref-like types #3367

Closed

Conversation

Sergio0694
Copy link
Member

@Sergio0694 Sergio0694 commented Jun 27, 2020

PR Type

What kind of change does this PR introduce?

  • Feature
  • Optimization

What is the current behavior?

Since the ByReference<T> type is not public, we needed to hack our way around it by using a Span<T> and the MemoryMarshal.CreateSpan<T>(ref T, int) API. This worked, but was slower and used more memory (types needed to store an entire Span<T> instead of just a reference). It also introduced overhead when accessing the actual reference from the Span<T> field.

What is the new behavior?

Following an incredible discovery shared by @ltrzesniewski (here), we're now externally making the ByReference<T> type public through a fake assembly referenced by the .NET Core targets in the HighPerformance package. As a result, alll the ref-like types in the package (eg. Ref<T>) now use that behind the scenes, offering the best performance possible. 🎉

Why is this a BIG deal?

The ability to directly use ByReference<T> completely removes the overhead from our ref-like types (Ref<T>, ReadOnlyRef<T>, etc.). This is because if we just need a ByReference<T> field in our types, the whole type only has the size equal to a native integer on any given platform (a pointer size), which allows the whole type to be stored in a single register. This effectively makes these types a 1:1 replacement for ref variables as far as performance goes! 🚀

Consider the following simple example:

public static int Get(Ref<int> reference)
{
    return reference.Value;
}

This used to produce the following, before this PR:

Get(Ref<int>):
    488B01      mov      rax, bword ptr [rcx]
    8B00        mov      eax, dword ptr [rax]

; Total bytes of code 6, prolog size 0, PerfScore 5.60

Note how the reference needs to actually be loaded into a register first (as Ref<int> itself was 16 bytes large (!) on x64, so it could only be passed by reference as it couldn't fit into a single register). Then once the reference was loaded, the final value could be accessed. It worked fine, but it was really not ideal especially for tight loops. In fact the Ref<T> type was primarily meant to be used for convenience (ie. to have ref T fields in ref struct types), but it couldn't replace manual code with eg. Unsafe.Add in critical paths). Compare that with the update codegen, with the changes from this PR:

Get(Ref<int>):
    8B01        mov      eax, dword ptr [rcx]

; Total bytes of code 3, prolog size 0, PerfScore 3.30

Here you can see the code is virtually identical to the one produced by a native ref int value: the Ref<T> value fits in a single register (as it's just the IntPtr size now) and it's treated by the JIT exactly like any other ref T argument. Basically the Ref<T> type now has no overhead at all over native reference types! 🎉

Size reductions

As mentioned above, this greatly improves the efficiency and size of all the ref-like types in the package.
In particular, on x64:

  • Ref<T>: 16 bytes ---> 8 bytes
  • ReadOnlyRef<T>: 16 bytes ---> 8 bytes
  • NullableRef<T>: 16 bytes ---> 8 bytes
  • NullableReadOnlyRef<T>: 16 bytes ---> 8 bytes

The [ReadOnly]NullableRef<T> types we have the same size reduction: in this case we're also completely removing the need for a secondary field to track whether or not we do have a value (like in Nullable<T>), because we can simply derive that value by checking whether the wrapper reference is in fact a null reference. Thanks to @ltrzesniewski for the suggestion here 😄

PR Checklist

Please check if your PR fulfills the following requirements:

  • Tested code with current supported SDKs
  • Pull Request has been submitted to the documentation repository instructions. Link: #396
  • Sample in sample app has been added / updated (for bug fixes / features)
  • Tests for the changes have been added (for bug fixes / features) (if applicable)
  • Header has been added to all new source files (run build/UpdateHeaders.bat)
  • Contains NO breaking changes

@ghost
Copy link

ghost commented Jun 27, 2020

Thanks Sergio0694 for opening a Pull Request! The reviewers will test the PR and highlight if there is any conflict or changes required. If the PR is approved we will proceed to merge the pull request 🙌

@ghost ghost assigned Kyaa-dost Jun 27, 2020
Copy link
Contributor

@john-h-k john-h-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not a religious man, but if there is a god of code he will damn this to the deepest depths of hell.

That being said, LGTM!

@ltrzesniewski
Copy link

ltrzesniewski commented Jun 27, 2020

Here's an idea: you could even get rid of that hasValue field in NullableReadOnlyRef<T> by using Unsafe.AreSame to compare the ref with the null ref (unless you want the null ref to be a valid value).

Something like:

public bool HasValue => !Unsafe.AreSame(ref reference.Value, ref Unsafe.AsRef<T>(null))

@Sergio0694
Copy link
Member Author

Sergio0694 commented Jun 27, 2020

@ltrzesniewski Ahahah yeah I like it! Not sure why I hadn't thought about that before, thanks! 😄🚀

EDIT: done in 224a23c.

@Sergio0694 Sergio0694 marked this pull request as ready for review June 27, 2020 16:39
@Sergio0694
Copy link
Member Author

CI build is fine and all tests are passing, it actually worked! 🚀🚀🚀

Thank you @ltrzesniewski again so much for the help setting this up! 🍻

This is needed so that when disassembling the ByReference<T> type imported from the HighPerformance package installed from NuGet, the size of the struct is correctly reported (previously it would show up as 1 byte instead). This change has no actual influence on how the type works, it's purely to improve user experience when using tools such as Re#'s disassembler
@Sergio0694
Copy link
Member Author

Just to be 1000% sure, I've also tested this locally, installing the preview Microsoft.Toolkit.HighPerformance.7.0.0-build.67.g327f8f98c1.nupkg NuGet package downloaded from the CI artifacts, works great! 🚀🚀🚀

@ltrzesniewski
Copy link

I'm curious: how is the codegen better after removing [MethodImpl(MethodImplOptions.NoInlining)] on throw helper methods? In my experience, it's better if these methods are not inlined.

@Sergio0694
Copy link
Member Author

Hey @ltrzesniewski - you're absolutely correct that throw helper methods should not be inlined, yup!
There are a couple more things to consider here though (credits to @john-h-k as I wasn't aware of this at first):

  1. The JIT is already able to identify throw methods, as in, methods that literally just have a single throw expression.
  2. By default, the JIT always assumes that forward branches are taken, unless it can see that the branching path always throws. Unfortunately though, if your throw helper method has the [MethodImpl] attribute on it, the JIT can't see this anymore.

So basically, using [MethodImpl(MethodImplOptions.NoInlinining)] will still cause the throw helper method not to be inlined, but the caller will then assume that the faulting path is actually always taken, which can (and does) result in worse codegen.

Consider this simple example:

public ref struct ByRef
{
    private int foo;
    
    public int Foo
    {
        get
        {
            if (foo == 0)
            {
                ThrowException();
            }
            
            return foo;
        }
    }
    
    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void ThrowException()
    {
        throw new InvalidOperationException("Foo");
    }
}

This gives us the following:

ByRef.get_Foo()
    L0000: push rsi
    L0001: sub rsp, 0x20
    L0005: mov rsi, rcx
    L0008: cmp dword ptr [rsi], 0
    L000b: jne short L0012
    L000d: call ByRef.ThrowException()
    L0012: mov eax, [rsi]
    L0014: add rsp, 0x20
    L0018: pop rsi
    L0019: ret

You can see how the correct path (that jne) is assumed non taken, and the linear code flow will just follow the original C# code 1:1 and call that throw helper. This is not what we want, as it means we'd always have one jump for a non-faulting execution.

This instead is the codegen when we remove that [MethodImpl] attribute:

ByRef.get_Foo()
    L0000: sub rsp, 0x28
    L0004: mov eax, [rcx]
    L0006: test eax, eax
    L0008: je short L000f
    L000a: add rsp, 0x28
    L000e: ret
    L000f: call ByRef.ThrowException()
    L0014: int3

You can see that the JIT can now correctly see that the method we invoke is indeed always throwing, so it switches the branch condition (we now have je so we only jump in the faulting case), and non-faulting executions will now have no jumps! 🚀

Also in this case the resulting assembly is also slightly more optimized, in particular the second version is not using rsi as an auxiliary register so the prologue is shorter too. This wouldn't matter once this method is inlined, but still using one less temporary register is always a good thing anyway. Also, more importantly, the first version had two memory accesses for some reason (that cmp dword ptr [rsi] and that following mov eax, [rsi] below), while the second version instead correctly only issues a single mov eax, [rcx] and then reuses the returned value from the local register. So yeah, much better codegen in general 🎉

Also you can see the faulting path now just issues an int3 after the call to our throw helper method, as it knows the method will never be able to correctly return after that function call. Hope this helps! 😄

@ltrzesniewski
Copy link

Thanks for the detailed explanation! 😄

@Sergio0694
Copy link
Member Author

You could, but the story gets very convoluted. It basically says you can only use this with confidence on .NET Framework.

Right, and yeah even then users could still end up with issues if they use it with incorrect arguments (as the reference can't be validated to be inside the given object). That makes perfect sense, I'll just add that as a general note then 👍

I do not see the similarity. This specifically talks about the empty array case.

Oh sorry, I misread that! I hadn't seen the note about an empty array and thought that was just a general disclaimer to basically say "this is extremely dangerous and we don't encourage you to use this for indexing, so use this at your own risk" 😄

Copy link
Member

@michael-hawker michael-hawker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sergio0694 can you just add a doc PR and link here? Then I think we'll be good to move forward with this update?

@@ -0,0 +1,26 @@
<Project Sdk="Microsoft.NET.Sdk">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a fake project/build thing for trickery and nothing that ends up in the final bits/package right?

Should we call that out in a comment above as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a note for that in the ByReference<T> file, as I thought that would've probably be the first file new devs would've tried to open when loading the project, as opposed to just double clicking the actual .csproj file in a text editor? 🤔

https://github.com/windows-toolkit/WindowsCommunityToolkit/blob/dc0d695531a9ecf3e4558939c3c6517a7c8f313b/System.Private.CoreLib/ByReference%7BT%7D.cs#L5-L11

I can move this in the .csproj file if you feel it's better, let me know! 😊

@Sergio0694
Copy link
Member Author

@michael-hawker Opened a doc PR with the update: MicrosoftDocs/WindowsCommunityToolkitDocs#396.

This type comes with a few caveats and should be used very carefully. On .NET Core runtimes, any use of Ref<T> may cause intermittent crashes or data corruptions, as Ref<T> is using internal implementation details of the .NET runtime in an unsupported way. Also, using this type can lead to runtime crashes if a Ref<T> instance is created with an invalid reference. In particular, you should only create a Ref<T> instance pointing to values that have a lifetime that is greater than that of the Ref<T> in use. Consider the following snippet [...]

@michael-hawker michael-hawker added for-review 📖 To evaluate and validate the Issues or PR reviewers wanted ✋ next preview ✈️ Label for marking what we want to include in the next preview release for developers to try. labels Oct 29, 2020
@michael-hawker michael-hawker added DO NOT MERGE ⚠️ and removed next preview ✈️ Label for marking what we want to include in the next preview release for developers to try. labels Nov 11, 2020
@michael-hawker
Copy link
Member

@Sergio0694 this is waiting on #3356 as well, eh?

@Sergio0694
Copy link
Member Author

Sergio0694 commented Nov 24, 2020

@michael-hawker Yup, waiting on that one to get merged so that I can properly setup the necessary attributes and code paths here when on the .NET 5 target so that the new ByReference<T> type doesn't fail to load at runtime and crash 😄

Will sync this PR and get it ready for review as soon as #3356 is merged!

@jkotas
Copy link

jkotas commented Nov 24, 2020

The documentation for this package is at https://docs.microsoft.com . It has a high chance of creating confusion with Microsoft customers that this package is supported by Microsoft and the .NET team.

Can we move the community toolkit documentation to some other place before this is merged?

@michael-hawker
Copy link
Member

@jkotas this package is supported by Microsoft, that's my role here. It's because of that support that we ship under the Microsoft.* package namespace on NuGet and ship our docs on docs.microsoft.com. All the doc urls clearly have 'community' in their path and the docs system points folks to our repo to file issues, just like the links from NuGet.

I don't think we've had cases of anyone in the past filing issues on the .NET repos for issues with the Toolkit specifically. If you've seen evidence contrary to this, please let me know.

@jkotas
Copy link

jkotas commented Nov 24, 2020

If this PR is merged, this package won't be supported by Microsoft anymore: If somebody calls Microsoft support that they are getting crash in GC, it will get to my team. If we trace the crash to this package, we will say that it is not a supported scenario.

@GrabYourPitchforks
Copy link

I guess the basic question is - what happens if a customer reports an issue and it's traced back to this? Presumably the resolution would be for the community toolkit to back out the ByReference<T> changes and to use the one-element Span<T> trick instead, then release a new version of the package and tell customers to upgrade?

@michael-hawker michael-hawker marked this pull request as draft November 24, 2020 21:59
@michael-hawker
Copy link
Member

Converting this back to draft until we can detail all the implications if we ship this.

In the meantime @Sergio0694 can you provide some metrics in your use-cases (or others like @ltrzesniewski's) on the perf benefits this provides? If anything even if we don't ship this PR, we can use it to help vet and aid in the proposed features for .NET 6.

@Sergio0694
Copy link
Member Author

Going to close this - the ref T fields feature is (almost?) officially in the .NET 6 and C# 10 milestone anyway, and the workaround we have right now for this is much safer and just a bit slower than the ByReference<T> hack. Folks that absolutely would need the better performance would likely just jump to .NET 6 as soon as it's out anyway. Don't think it's worth it to potentially break people accidentally using this version of the package on .NET 6 previews, and the friction with the .NET team that is rightfully worried about people potentially reporting issues just in case they stumbled upon one of those untested GC tracking scenarios.

Thank you @jkotas and @GrabYourPitchforks for the input and for your time, really appreciate it! 😊
And really happy to see this feature making its way into the language in the future (also thanks to Jared's work on this)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Completed 🔥 DO NOT MERGE ⚠️ for-review 📖 To evaluate and validate the Issues or PR high-performance 🚂 Issues/PRs for the Microsoft.Toolkit.HighPerformance package optimization ☄ Performance or memory usage improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants