Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Extend Dictionary concurrent access detection to Remove() #18524

Merged
merged 2 commits into from
Jun 20, 2018
Merged

Extend Dictionary concurrent access detection to Remove() #18524

merged 2 commits into from
Jun 20, 2018

Conversation

MarcoRossignoli
Copy link
Member

@MarcoRossignoli MarcoRossignoli commented Jun 18, 2018

closes https://github.com/dotnet/corefx/issues/30023

I used Ben's idea dotnet/corefx@6df07c4

Do we need some tests?It's not easy to repro...but if you've some fast way to break(maybe some old code used for FindEntry) i could test in local.

/cc @danmosemsft @stephentoub @benaadams

NB. A few years ago i lost hours to spot this bug(dump/WinDbg) in a third-party lib, 100% cpu on FindEntry.

@danmoseley
Copy link
Member

If I wanted to test, I would use reflection in the test eg to add a loop in entries.

@MarcoRossignoli
Copy link
Member Author

If I wanted to test, I would use reflection in the test eg to add a loop in entries.

I could try to add a tests case for CI, what do you think?

@jkotas
Copy link
Member

jkotas commented Jun 18, 2018

@dotnet-bot test this please

@danmoseley
Copy link
Member

@MarcoRossignoli test would be great, it is a bit hacky to use reflection but we don't want to regress this as its such an impactful issue.would you mind adding the same test for the existing cases? Should be trivial when you have done one.

@MarcoRossignoli
Copy link
Member Author

would you mind adding the same test for the existing cases?

Sure!

@MarcoRossignoli
Copy link
Member Author

@danmosemsft do you wait tests to merge this?

@danmoseley
Copy link
Member

I think if you're able to write them now, there is no reason to merge this without it.

@MarcoRossignoli
Copy link
Member Author

I think if you're able to write them now, there is no reason to merge this without it.

Ok!Working on it!

@MarcoRossignoli MarcoRossignoli changed the title Extend Dictionary concurrent access detection to Remove() [WIP]Extend Dictionary concurrent access detection to Remove() Jun 18, 2018
@Wraith2
Copy link

Wraith2 commented Jun 18, 2018

Can I ask why the change from _entries to the local entries? I was looking at implementing RemoveAll and found both direct and local access in dictionary and wasn't sure which was preferable. Clearly local is better in this case but why?

@danmoseley
Copy link
Member

@Wraith2 I understand that it produces better code gen with the current JIT. We only do tricks like that where it's really hot, as it's not a good idea to code around the JIT in general. @AndyAyersMS is this correct, and do you expect the JIT to be able to do this itself in future?

@MarcoRossignoli
Copy link
Member Author

@danmosemsft @Wraith2 my idea is to be more close as possible to code style in other two methods with concurrent access detection FindEntry/TryInsert

@Wraith2
Copy link

Wraith2 commented Jun 18, 2018

@MarcoRossignoli understood, and it looks like the fix I'd put into my working version. I just wondered at the choice of change since I'm trying to feel my own way around what I should and shouldn't change.

TryInsert for example mixes a local for entries with direct access to buckets while FindEntry has locals for both, I wouldn't want to introduce performance degradations in something as central as Dictionary by not understanding the choice if there was one. It might be that it matters depending on the number of accesses or where they are etc. This seemed like a good place to ask.

@MarcoRossignoli
Copy link
Member Author

MarcoRossignoli commented Jun 18, 2018

This seemed like a good place to ask.

I agree...

@AndyAyersMS
Copy link
Member

In code like this where access to a class member like entries is interleaved with other computation, including calls:

    ... this.entries ...
    ... code that makes calls ...
    ... this.entries ... (probably unmodified from above)

the jit is going to assume that the this this.entries member can be reassigned by the calls and so the jit will likely reload it at each use. Caching the member in a local (when valid) makes it more obvious the value does not change. The jit will then try to keep the value in a register and can more readily propagate facts about the value (like non-nullness or bounds check state) that can enable other optimizations.

So generally speaking I would expect caching a frequently accessed member in a local would be a good idea perf wise, though it is very important to measure and be sure. Sometimes the jit has a lot of competition for things that can go in registers and adding one more "long lifetime" resource to the mix can have unexpected consequences.

Mixing locally cached and direct member access in one method is confusing (though sometimes necessary if the member value can indeed change). But I'd avoid mixing usage unless it's really needed for correctness, and if it really is necessary, make sure to comment on this in detail.

Unfortunately there's no easy way to catch if you mix usages by accident other than careful review of the source and perhaps the generated assembly.

@MarcoRossignoli
Copy link
Member Author

MarcoRossignoli commented Jun 18, 2018

@AndyAyersMS thank's a lot for explanation! So in this case cache values could improve perf because remove methods does not change entry array or rather when change we set some fields on a particular entry and return(we found entry to "remove"). If i understood well generally speaking coreclr will try to optimize "read-only loop code"...like while(true)... search.
EDIT: I'd like to go deeper...is there a way to obtain cold coreclr asm dump with applied rules log?Or it depends too much on live runtime context?

@AndyAyersMS
Copy link
Member

For Dictionary<K,T> methods the code the jit generates will depend a lot on K and T, the target ISA and ABI, and whether the code is prejitted or jitted. It will not depend on much else.

For some popular combinations of types the code for the methods will be prejitted ahead of time via crossgen. For other types the code will be jitted the first time the method is called.

To view the code the jit generates you can do a number of things:

  • Run the app under a debugger and break in when the method is called and disassemble. Make sure debugger does not disable jit optimizations
  • Run via BenchmarkDotNet and use the disassembly diagnoser
  • Build a Checked local CoreCLR and enable jit disassembly (by setting COMPlus_JitDisasm). See for instance Viewing Jit Dumps
  • Disassemble during crossgen of System.Private.CoreLib (by setting COMPlus_NgenDisasm)

@danmoseley
Copy link
Member

@MarcoRossignoli it would be interesting to run the perf tests for dictionary before and after your change. They might also suggest the effect of extracting locals in this way.

They are here -- I see only one Remove test, Remove_ValueType -- that's probably OK. You could add a similar one (even if you don't check it in) for the other overload.

https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/performance-tests.md explains running corefx perf tests - you already know how to patch CoreFX with CoreCLR - this time it would of course be release built bits.

@MarcoRossignoli
Copy link
Member Author

MarcoRossignoli commented Jun 19, 2018

@danmosemsft if i compile with my local clr bits(updated cleaned repos) i get compile error:

MembersMustExist : Member 'System.Runtime.Intrinsics.X86.Avx.Extract(System.Runtime.Intrinsics.Vector256<System.Int16>, System.Byte)'
...

all errors are on System.Runtime.Intrinsics.Experimental, is this expected?
This is not first time...other times worked well, normal corefx build works.

@jkotas
Copy link
Member

jkotas commented Jun 19, 2018

This happens as breaking changes propagate through the system. This one will fix itself up once dotnet/corefx#30497 is merge. You can ignore this.

To avoid this sort of build break, you would need to sync your coreclr repo back to the build that corefx is using: find the CoreCLR package version in corefx/dependencies.props and then find the hash that the package was built from at https://dotnet.myget.org/feed/dotnet-core/package/nuget/runtime.osx-x64.Microsoft.NETCore.Runtime.CoreCLR. We plan to have a better system for this.

@MarcoRossignoli
Copy link
Member Author

MarcoRossignoli commented Jun 19, 2018

This happens as breaking changes propagate through the system. This one will fix itself up once dotnet/corefx#30497 is merge. You can ignore this.

Ok i'll wait merge.

To avoid this sort of build break, you would need to sync your coreclr repo back to the build that corefx is using: find the CoreCLR package version in corefx/dependencies.props and then find the hash that the package was built from at https://dotnet.myget.org/feed/dotnet-core/package/nuget/runtime.osx-x64.Microsoft.NETCore.Runtime.CoreCLR. We plan to have a better system for this.

Understood..it means i need to wait to "test my local updates"...because on myget i can download only compiled package...no sources...isn't it?Or there is a way to sync your coreclr repo back at source level?

my props file(working on win):

<CoreClrCurrentRef>9a1f1f4afe718228ba1252b43a2cebb7165bacd1</CoreClrCurrentRef>
...
<MicrosoftNETCoreRuntimeCoreCLRPackageVersion>2.2.0-preview1-26616-03</MicrosoftNETCoreRuntimeCoreCLRPackageVersion>

As i said i can wait...but it would be great be able to "work" also in this case(until better way), i don't want to waste precious spare time 😃.
Don't worry if you don't have much time, i'll wait merge!

EDIT: Thank's a lot!

@jkotas
Copy link
Member

jkotas commented Jun 19, 2018

Or there is a way to sync your coreclr repo back at source level?

Here is what the exact steps would be in this case:

@MarcoRossignoli
Copy link
Member Author

Here is what the exact steps would be in this case:..

Great!Thank's a lot!

@MarcoRossignoli
Copy link
Member Author

MarcoRossignoli commented Jun 19, 2018

Tests code:

        [Benchmark]
        [InlineData(20000)]
        [InlineData(100000)]
        [InlineData(1000000)]
        public static void Remove_ValueType(long size)
        {
            Dictionary<long?, long?> collection = new Dictionary<long?, long?>();
            long?[] items;

            items = new long?[size * 10];

            for (long i = 0; i < size * 10; ++i)
            {
                items[i] = i;
                collection.Add(items[i], items[i]);
            }


            foreach (var iteration in Benchmark.Iterations)
                using (iteration.StartMeasurement())
                    for (long i = 1; i < size; ++i)
                        collection.Remove(items[i]);
        }

        [Benchmark]
        [InlineData(20000)]
        [InlineData(100000)]
        [InlineData(1000000)]
        public static void Remove_ValueType_Out(long size)
        {
            Dictionary<long?, long?> collection = new Dictionary<long?, long?>();
            long?[] items;

            items = new long?[size * 10];

            for (long i = 0; i < size * 10; ++i)
            {
                items[i] = i;
                collection.Add(items[i], items[i]);
            }


            foreach (var iteration in Benchmark.Iterations)
                using (iteration.StartMeasurement())
                    for (long i = 1; i < size; ++i)
                        collection.Remove(items[i], out long? value);
        }

Before

System.Collections.Performance.Tests.dll Metric Unit Iterations Average STDEV.S Min Max
System.Collections.Tests.Perf_Dictionary.Remove_ValueType(size: 100000) Duration msec 1000 0.800 0.225 0.707 2.285
System.Collections.Tests.Perf_Dictionary.Remove_ValueType(size: 1000000) Duration msec 1000 7.662 0.245 7.388 13.435
System.Collections.Tests.Perf_Dictionary.Remove_ValueType(size: 20000) Duration msec 1000 0.152 0.014 0.142 0.422
System.Collections.Tests.Perf_Dictionary.Remove_ValueType_Out(size: 100000) Duration msec 1000 0.894 0.063 0.828 1.264
System.Collections.Tests.Perf_Dictionary.Remove_ValueType_Out(size: 1000000) Duration msec 1000 8.994 0.203 8.704 10.738
System.Collections.Tests.Perf_Dictionary.Remove_ValueType_Out(size: 20000) Duration msec 1000 0.179 0.018 0.166 0.347

After

System.Collections.Performance.Tests.dll Metric Unit Iterations Average STDEV.S Min Max
System.Collections.Tests.Perf_Dictionary.Remove_ValueType(size: 100000) Duration msec 1000 0.796 0.065 0.733 1.391
System.Collections.Tests.Perf_Dictionary.Remove_ValueType(size: 1000000) Duration msec 1000 7.965 0.213 7.641 11.305
System.Collections.Tests.Perf_Dictionary.Remove_ValueType(size: 20000) Duration msec 1000 0.160 0.020 0.147 0.341
System.Collections.Tests.Perf_Dictionary.Remove_ValueType_Out(size: 100000) Duration msec 1000 0.877 0.056 0.816 1.260
System.Collections.Tests.Perf_Dictionary.Remove_ValueType_Out(size: 1000000) Duration msec 1000 8.843 0.551 8.532 16.917
System.Collections.Tests.Perf_Dictionary.Remove_ValueType_Out(size: 20000) Duration msec 1000 0.182 0.028 0.164 0.572

@AndyAyersMS @danmosemsft @jkotas these are perf test result...seems there is no great difference, let me know if code test code is correct(first time with "Microsoft xunit-performance framework".

What i've done:

  1. checkout master build release
  2. build corefx in release with my local clr bits
  3. buildandtest perf 3 times...checked that values are all similar
  4. checkout my clr branch with updates(this PR)
  5. build -skiptests -release -skipnative
  6. buildandtest perf 3 times...checked that values are all similar

Done some times.
Let me know if results make sense for you.

@MarcoRossignoli MarcoRossignoli changed the title [WIP]Extend Dictionary concurrent access detection to Remove() Extend Dictionary concurrent access detection to Remove() Jun 19, 2018
@MarcoRossignoli
Copy link
Member Author

@dotnet-bot test this please

@danmoseley
Copy link
Member

@MarcoRossignoli that sounds right as long as obviously you're patching with your coreclr bits each time. What I sometimes do is put in a Thread.Sleep(50) or something as a sanity-check to make sure I am successfully consuming the bits I expect (then re-measure without it!)

@MarcoRossignoli
Copy link
Member Author

MarcoRossignoli commented Jun 20, 2018

What I sometimes do is put in a Thread.Sleep(50) or something as a sanity-check to make sure I am successfully consuming the bits I expect (then re-measure without it!)

Yep sure i did it but omitted from list(Thread.Sleep(50))!

@MarcoRossignoli
Copy link
Member Author

MarcoRossignoli commented Jun 20, 2018

@dotnet-bot test Windows_NT arm Cross Checked Innerloop Build and Test please

EDIT: @jkotas are these CI tests different from others?I tried to re-run but it doesn't work.

@danmoseley
Copy link
Member

The ARM failures are
:[MaskLoad_ro.cmd_11848] JIT\HardwareIntrinsics\X86\Avx\MaskLoad_ro\MaskLoad_ro.cmd and
[MaskLoad_ro.cmd_11531] JIT\HardwareIntrinsics\X86\Avx\MaskLoad_ro\MaskLoad_ro.cmd

@dotnet/jit-contrib is there a way to access the test logs? It seems unlikely that a change in Dictionary could break this test although I don't see the failure in other PR's.

@CarolEidt
Copy link

Those tests have now been removed from the tests.lst files (#18569)

@danmoseley
Copy link
Member

Thanks @CarolEidt

@danmoseley danmoseley merged commit 1c0c5ac into dotnet:master Jun 20, 2018
dotnet-bot pushed a commit to dotnet/corefx that referenced this pull request Jun 20, 2018
Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
dotnet-bot pushed a commit to dotnet/corert that referenced this pull request Jun 20, 2018
Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
jkotas pushed a commit to dotnet/corert that referenced this pull request Jun 20, 2018
Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
jkotas pushed a commit to dotnet/corefx that referenced this pull request Jun 20, 2018
Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
@MarcoRossignoli MarcoRossignoli deleted the dicrem branch June 21, 2018 07:08
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend Dictionary concurrent access detection to Remove()
6 participants