Extend Dictionary concurrent access detection to Remove() #18524

MarcoRossignoli · 2018-06-18T10:28:13Z

closes https://github.com/dotnet/corefx/issues/30023

I used Ben's idea dotnet/corefx@6df07c4

Do we need some tests?It's not easy to repro...but if you've some fast way to break(maybe some old code used for FindEntry) i could test in local.

/cc @danmosemsft @stephentoub @benaadams

NB. A few years ago i lost hours to spot this bug(dump/WinDbg) in a third-party lib, 100% cpu on FindEntry.

danmoseley · 2018-06-18T14:27:26Z

If I wanted to test, I would use reflection in the test eg to add a loop in entries.

MarcoRossignoli · 2018-06-18T14:35:12Z

If I wanted to test, I would use reflection in the test eg to add a loop in entries.

I could try to add a tests case for CI, what do you think?

jkotas · 2018-06-18T14:59:15Z

@dotnet-bot test this please

danmoseley · 2018-06-18T15:16:08Z

@MarcoRossignoli test would be great, it is a bit hacky to use reflection but we don't want to regress this as its such an impactful issue.would you mind adding the same test for the existing cases? Should be trivial when you have done one.

MarcoRossignoli · 2018-06-18T15:46:00Z

would you mind adding the same test for the existing cases?

Sure!

MarcoRossignoli · 2018-06-18T17:16:28Z

@danmosemsft do you wait tests to merge this?

danmoseley · 2018-06-18T17:17:34Z

I think if you're able to write them now, there is no reason to merge this without it.

MarcoRossignoli · 2018-06-18T17:19:27Z

I think if you're able to write them now, there is no reason to merge this without it.

Ok!Working on it!

Wraith2 · 2018-06-18T18:55:48Z

Can I ask why the change from _entries to the local entries? I was looking at implementing RemoveAll and found both direct and local access in dictionary and wasn't sure which was preferable. Clearly local is better in this case but why?

danmoseley · 2018-06-18T19:45:42Z

@Wraith2 I understand that it produces better code gen with the current JIT. We only do tricks like that where it's really hot, as it's not a good idea to code around the JIT in general. @AndyAyersMS is this correct, and do you expect the JIT to be able to do this itself in future?

MarcoRossignoli · 2018-06-18T19:51:34Z

@danmosemsft @Wraith2 my idea is to be more close as possible to code style in other two methods with concurrent access detection FindEntry/TryInsert

Wraith2 · 2018-06-18T20:19:59Z

@MarcoRossignoli understood, and it looks like the fix I'd put into my working version. I just wondered at the choice of change since I'm trying to feel my own way around what I should and shouldn't change.

TryInsert for example mixes a local for entries with direct access to buckets while FindEntry has locals for both, I wouldn't want to introduce performance degradations in something as central as Dictionary by not understanding the choice if there was one. It might be that it matters depending on the number of accesses or where they are etc. This seemed like a good place to ask.

MarcoRossignoli · 2018-06-18T20:24:54Z

This seemed like a good place to ask.

I agree...

AndyAyersMS · 2018-06-18T21:02:34Z

In code like this where access to a class member like entries is interleaved with other computation, including calls:

    ... this.entries ...
    ... code that makes calls ...
    ... this.entries ... (probably unmodified from above)

the jit is going to assume that the this this.entries member can be reassigned by the calls and so the jit will likely reload it at each use. Caching the member in a local (when valid) makes it more obvious the value does not change. The jit will then try to keep the value in a register and can more readily propagate facts about the value (like non-nullness or bounds check state) that can enable other optimizations.

So generally speaking I would expect caching a frequently accessed member in a local would be a good idea perf wise, though it is very important to measure and be sure. Sometimes the jit has a lot of competition for things that can go in registers and adding one more "long lifetime" resource to the mix can have unexpected consequences.

Mixing locally cached and direct member access in one method is confusing (though sometimes necessary if the member value can indeed change). But I'd avoid mixing usage unless it's really needed for correctness, and if it really is necessary, make sure to comment on this in detail.

Unfortunately there's no easy way to catch if you mix usages by accident other than careful review of the source and perhaps the generated assembly.

MarcoRossignoli · 2018-06-18T22:11:32Z

@AndyAyersMS thank's a lot for explanation! So in this case cache values could improve perf because remove methods does not change entry array or rather when change we set some fields on a particular entry and return(we found entry to "remove"). If i understood well generally speaking coreclr will try to optimize "read-only loop code"...like while(true)... search.
EDIT: I'd like to go deeper...is there a way to obtain cold coreclr asm dump with applied rules log?Or it depends too much on live runtime context?

AndyAyersMS · 2018-06-18T22:39:51Z

For Dictionary<K,T> methods the code the jit generates will depend a lot on K and T, the target ISA and ABI, and whether the code is prejitted or jitted. It will not depend on much else.

For some popular combinations of types the code for the methods will be prejitted ahead of time via crossgen. For other types the code will be jitted the first time the method is called.

To view the code the jit generates you can do a number of things:

Run the app under a debugger and break in when the method is called and disassemble. Make sure debugger does not disable jit optimizations
Run via BenchmarkDotNet and use the disassembly diagnoser
Build a Checked local CoreCLR and enable jit disassembly (by setting COMPlus_JitDisasm). See for instance Viewing Jit Dumps
Disassemble during crossgen of System.Private.CoreLib (by setting COMPlus_NgenDisasm)

danmoseley · 2018-06-19T00:31:49Z

@MarcoRossignoli it would be interesting to run the perf tests for dictionary before and after your change. They might also suggest the effect of extracting locals in this way.

They are here -- I see only one Remove test, Remove_ValueType -- that's probably OK. You could add a similar one (even if you don't check it in) for the other overload.

https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/performance-tests.md explains running corefx perf tests - you already know how to patch CoreFX with CoreCLR - this time it would of course be release built bits.

MarcoRossignoli · 2018-06-19T10:30:20Z

@danmosemsft if i compile with my local clr bits(updated cleaned repos) i get compile error:

MembersMustExist : Member 'System.Runtime.Intrinsics.X86.Avx.Extract(System.Runtime.Intrinsics.Vector256<System.Int16>, System.Byte)'
...

all errors are on System.Runtime.Intrinsics.Experimental, is this expected?
This is not first time...other times worked well, normal corefx build works.

jkotas · 2018-06-19T11:47:08Z

This happens as breaking changes propagate through the system. This one will fix itself up once dotnet/corefx#30497 is merge. You can ignore this.

To avoid this sort of build break, you would need to sync your coreclr repo back to the build that corefx is using: find the CoreCLR package version in corefx/dependencies.props and then find the hash that the package was built from at https://dotnet.myget.org/feed/dotnet-core/package/nuget/runtime.osx-x64.Microsoft.NETCore.Runtime.CoreCLR. We plan to have a better system for this.

MarcoRossignoli · 2018-06-19T12:41:05Z

This happens as breaking changes propagate through the system. This one will fix itself up once dotnet/corefx#30497 is merge. You can ignore this.

Ok i'll wait merge.

To avoid this sort of build break, you would need to sync your coreclr repo back to the build that corefx is using: find the CoreCLR package version in corefx/dependencies.props and then find the hash that the package was built from at https://dotnet.myget.org/feed/dotnet-core/package/nuget/runtime.osx-x64.Microsoft.NETCore.Runtime.CoreCLR. We plan to have a better system for this.

Understood..it means i need to wait to "test my local updates"...because on myget i can download only compiled package...no sources...isn't it?Or there is a way to sync your coreclr repo back at source level?

my props file(working on win):

<CoreClrCurrentRef>9a1f1f4afe718228ba1252b43a2cebb7165bacd1</CoreClrCurrentRef>
...
<MicrosoftNETCoreRuntimeCoreCLRPackageVersion>2.2.0-preview1-26616-03</MicrosoftNETCoreRuntimeCoreCLRPackageVersion>

As i said i can wait...but it would be great be able to "work" also in this case(until better way), i don't want to waste precious spare time 😃.
Don't worry if you don't have much time, i'll wait merge!

EDIT: Thank's a lot!

jkotas · 2018-06-19T13:22:45Z

Or there is a way to sync your coreclr repo back at source level?

Here is what the exact steps would be in this case:

Find git hash in description of https://dotnet.myget.org/feed/dotnet-core/package/nuget/runtime.win-x64.Microsoft.NETCore.Runtime.CoreCLR/2.2.0-preview1-26616-03 . It is 45da72ca0f1493f793e30f39a7da0e59acb8bdbc
Rebase your change against this git hash. In your local coreclr, run git rebase 45da72ca0f1493f793e30f39a7da0e59acb8bdbc
Rebuild. Now, you have coreclr build that will work with your local corefx.

MarcoRossignoli · 2018-06-19T14:55:55Z

Here is what the exact steps would be in this case:..

Great!Thank's a lot!

MarcoRossignoli · 2018-06-19T20:21:03Z

Tests code:

        [Benchmark]
        [InlineData(20000)]
        [InlineData(100000)]
        [InlineData(1000000)]
        public static void Remove_ValueType(long size)
        {
            Dictionary<long?, long?> collection = new Dictionary<long?, long?>();
            long?[] items;

            items = new long?[size * 10];

            for (long i = 0; i < size * 10; ++i)
            {
                items[i] = i;
                collection.Add(items[i], items[i]);
            }


            foreach (var iteration in Benchmark.Iterations)
                using (iteration.StartMeasurement())
                    for (long i = 1; i < size; ++i)
                        collection.Remove(items[i]);
        }

        [Benchmark]
        [InlineData(20000)]
        [InlineData(100000)]
        [InlineData(1000000)]
        public static void Remove_ValueType_Out(long size)
        {
            Dictionary<long?, long?> collection = new Dictionary<long?, long?>();
            long?[] items;

            items = new long?[size * 10];

            for (long i = 0; i < size * 10; ++i)
            {
                items[i] = i;
                collection.Add(items[i], items[i]);
            }


            foreach (var iteration in Benchmark.Iterations)
                using (iteration.StartMeasurement())
                    for (long i = 1; i < size; ++i)
                        collection.Remove(items[i], out long? value);
        }

Before

System.Collections.Performance.Tests.dll	Metric	Unit	Iterations	Average	STDEV.S	Min	Max
System.Collections.Tests.Perf_Dictionary.Remove_ValueType(size: 100000)	Duration	msec	1000	0.800	0.225	0.707	2.285
System.Collections.Tests.Perf_Dictionary.Remove_ValueType(size: 1000000)	Duration	msec	1000	7.662	0.245	7.388	13.435
System.Collections.Tests.Perf_Dictionary.Remove_ValueType(size: 20000)	Duration	msec	1000	0.152	0.014	0.142	0.422
System.Collections.Tests.Perf_Dictionary.Remove_ValueType_Out(size: 100000)	Duration	msec	1000	0.894	0.063	0.828	1.264
System.Collections.Tests.Perf_Dictionary.Remove_ValueType_Out(size: 1000000)	Duration	msec	1000	8.994	0.203	8.704	10.738
System.Collections.Tests.Perf_Dictionary.Remove_ValueType_Out(size: 20000)	Duration	msec	1000	0.179	0.018	0.166	0.347

After

System.Collections.Performance.Tests.dll	Metric	Unit	Iterations	Average	STDEV.S	Min	Max
System.Collections.Tests.Perf_Dictionary.Remove_ValueType(size: 100000)	Duration	msec	1000	0.796	0.065	0.733	1.391
System.Collections.Tests.Perf_Dictionary.Remove_ValueType(size: 1000000)	Duration	msec	1000	7.965	0.213	7.641	11.305
System.Collections.Tests.Perf_Dictionary.Remove_ValueType(size: 20000)	Duration	msec	1000	0.160	0.020	0.147	0.341
System.Collections.Tests.Perf_Dictionary.Remove_ValueType_Out(size: 100000)	Duration	msec	1000	0.877	0.056	0.816	1.260
System.Collections.Tests.Perf_Dictionary.Remove_ValueType_Out(size: 1000000)	Duration	msec	1000	8.843	0.551	8.532	16.917
System.Collections.Tests.Perf_Dictionary.Remove_ValueType_Out(size: 20000)	Duration	msec	1000	0.182	0.028	0.164	0.572

@AndyAyersMS @danmosemsft @jkotas these are perf test result...seems there is no great difference, let me know if code test code is correct(first time with "Microsoft xunit-performance framework".

What i've done:

checkout master build release
build corefx in release with my local clr bits
buildandtest perf 3 times...checked that values are all similar
checkout my clr branch with updates(this PR)
build -skiptests -release -skipnative
buildandtest perf 3 times...checked that values are all similar

Done some times.
Let me know if results make sense for you.

MarcoRossignoli · 2018-06-19T20:27:04Z

@dotnet-bot test this please

danmoseley · 2018-06-19T21:34:56Z

@MarcoRossignoli that sounds right as long as obviously you're patching with your coreclr bits each time. What I sometimes do is put in a Thread.Sleep(50) or something as a sanity-check to make sure I am successfully consuming the bits I expect (then re-measure without it!)

MarcoRossignoli · 2018-06-20T06:39:04Z

What I sometimes do is put in a Thread.Sleep(50) or something as a sanity-check to make sure I am successfully consuming the bits I expect (then re-measure without it!)

Yep sure i did it but omitted from list(Thread.Sleep(50))!

MarcoRossignoli · 2018-06-20T16:29:21Z

@dotnet-bot test Windows_NT arm Cross Checked Innerloop Build and Test please

EDIT: @jkotas are these CI tests different from others?I tried to re-run but it doesn't work.

danmoseley · 2018-06-20T18:32:35Z

The ARM failures are
:[MaskLoad_ro.cmd_11848] JIT\HardwareIntrinsics\X86\Avx\MaskLoad_ro\MaskLoad_ro.cmd and
[MaskLoad_ro.cmd_11531] JIT\HardwareIntrinsics\X86\Avx\MaskLoad_ro\MaskLoad_ro.cmd

@dotnet/jit-contrib is there a way to access the test logs? It seems unlikely that a change in Dictionary could break this test although I don't see the failure in other PR's.

CarolEidt · 2018-06-20T18:38:48Z

Those tests have now been removed from the tests.lst files (#18569)

danmoseley · 2018-06-20T19:19:02Z

Thanks @CarolEidt

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>

Commit migrated from dotnet/coreclr@1c0c5ac

added concurrent access detection to Remove()

1f772e7

MarcoRossignoli changed the title ~~Extend Dictionary concurrent access detection to Remove()~~ [WIP]Extend Dictionary concurrent access detection to Remove() Jun 18, 2018

Merge branch 'master' into dicrem

7c33e68

MarcoRossignoli changed the title ~~[WIP]Extend Dictionary concurrent access detection to Remove()~~ Extend Dictionary concurrent access detection to Remove() Jun 19, 2018

danmoseley approved these changes Jun 20, 2018

View reviewed changes

danmoseley mentioned this pull request Jun 20, 2018

Add concurrent access detection tests to Dictionary<TKey, TValue> dotnet/corefx#30515

Merged

danmoseley merged commit 1c0c5ac into dotnet:master Jun 20, 2018

dotnet-bot pushed a commit to dotnet/corefx that referenced this pull request Jun 20, 2018

added concurrent access detection to Remove() (dotnet/coreclr#18524)

2388d48

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>

dotnet-bot pushed a commit to dotnet/corert that referenced this pull request Jun 20, 2018

added concurrent access detection to Remove() (dotnet/coreclr#18524)

ca4d35e

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

jkotas pushed a commit to dotnet/corert that referenced this pull request Jun 20, 2018

added concurrent access detection to Remove() (dotnet/coreclr#18524)

6a76847

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

jkotas pushed a commit to dotnet/corefx that referenced this pull request Jun 20, 2018

added concurrent access detection to Remove() (dotnet/coreclr#18524)

618ea91

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>

MarcoRossignoli deleted the dicrem branch June 21, 2018 07:08

lewurm mentioned this pull request Feb 1, 2019

[2018-08] Bump corert mono/mono#12721

Merged

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022

added concurrent access detection to Remove() (dotnet/coreclr#18524)

36e2634

Commit migrated from dotnet/coreclr@1c0c5ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend Dictionary concurrent access detection to Remove() #18524

Extend Dictionary concurrent access detection to Remove() #18524

MarcoRossignoli commented Jun 18, 2018 •

edited

Loading

danmoseley commented Jun 18, 2018

MarcoRossignoli commented Jun 18, 2018

jkotas commented Jun 18, 2018

danmoseley commented Jun 18, 2018

MarcoRossignoli commented Jun 18, 2018

MarcoRossignoli commented Jun 18, 2018

danmoseley commented Jun 18, 2018

MarcoRossignoli commented Jun 18, 2018

Wraith2 commented Jun 18, 2018

danmoseley commented Jun 18, 2018

MarcoRossignoli commented Jun 18, 2018

Wraith2 commented Jun 18, 2018

MarcoRossignoli commented Jun 18, 2018 •

edited

Loading

AndyAyersMS commented Jun 18, 2018

MarcoRossignoli commented Jun 18, 2018 •

edited

Loading

AndyAyersMS commented Jun 18, 2018

danmoseley commented Jun 19, 2018

MarcoRossignoli commented Jun 19, 2018 •

edited

Loading

jkotas commented Jun 19, 2018 •

edited

Loading

MarcoRossignoli commented Jun 19, 2018 •

edited

Loading

jkotas commented Jun 19, 2018

MarcoRossignoli commented Jun 19, 2018

MarcoRossignoli commented Jun 19, 2018 •

edited

Loading

MarcoRossignoli commented Jun 19, 2018

danmoseley commented Jun 19, 2018

MarcoRossignoli commented Jun 20, 2018 •

edited

Loading

MarcoRossignoli commented Jun 20, 2018 •

edited

Loading

danmoseley commented Jun 20, 2018

CarolEidt commented Jun 20, 2018

danmoseley commented Jun 20, 2018

Extend Dictionary concurrent access detection to Remove() #18524

Extend Dictionary concurrent access detection to Remove() #18524

Conversation

MarcoRossignoli commented Jun 18, 2018 • edited Loading

danmoseley commented Jun 18, 2018

MarcoRossignoli commented Jun 18, 2018

jkotas commented Jun 18, 2018

danmoseley commented Jun 18, 2018

MarcoRossignoli commented Jun 18, 2018

MarcoRossignoli commented Jun 18, 2018

danmoseley commented Jun 18, 2018

MarcoRossignoli commented Jun 18, 2018

Wraith2 commented Jun 18, 2018

danmoseley commented Jun 18, 2018

MarcoRossignoli commented Jun 18, 2018

Wraith2 commented Jun 18, 2018

MarcoRossignoli commented Jun 18, 2018 • edited Loading

AndyAyersMS commented Jun 18, 2018

MarcoRossignoli commented Jun 18, 2018 • edited Loading

AndyAyersMS commented Jun 18, 2018

danmoseley commented Jun 19, 2018

MarcoRossignoli commented Jun 19, 2018 • edited Loading

jkotas commented Jun 19, 2018 • edited Loading

MarcoRossignoli commented Jun 19, 2018 • edited Loading

jkotas commented Jun 19, 2018

MarcoRossignoli commented Jun 19, 2018

MarcoRossignoli commented Jun 19, 2018 • edited Loading

MarcoRossignoli commented Jun 19, 2018

danmoseley commented Jun 19, 2018

MarcoRossignoli commented Jun 20, 2018 • edited Loading

MarcoRossignoli commented Jun 20, 2018 • edited Loading

danmoseley commented Jun 20, 2018

CarolEidt commented Jun 20, 2018

danmoseley commented Jun 20, 2018

MarcoRossignoli commented Jun 18, 2018 •

edited

Loading

MarcoRossignoli commented Jun 18, 2018 •

edited

Loading

MarcoRossignoli commented Jun 18, 2018 •

edited

Loading

MarcoRossignoli commented Jun 19, 2018 •

edited

Loading

jkotas commented Jun 19, 2018 •

edited

Loading

MarcoRossignoli commented Jun 19, 2018 •

edited

Loading

MarcoRossignoli commented Jun 19, 2018 •

edited

Loading

MarcoRossignoli commented Jun 20, 2018 •

edited

Loading

MarcoRossignoli commented Jun 20, 2018 •

edited

Loading