Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

DictionarySlim backport improvements, retaining more entropy #22832

Closed
wants to merge 5 commits into from
Closed

DictionarySlim backport improvements, retaining more entropy #22832

wants to merge 5 commits into from

Conversation

MarcoRossignoli
Copy link
Member

@MarcoRossignoli MarcoRossignoli commented Feb 25, 2019

contributes to https://github.com/dotnet/corefx/issues/33392

Use uint hashcode to retain more entropy

before

BenchmarkDotNet=v0.11.3.1003-nightly, OS=Windows 10.0.17134.590 (1803/April2018Update/Redstone4)
Intel Core i7 CPU 860 2.80GHz (Nehalem), 1 CPU, 8 logical and 4 physical cores
Frequency=2727535 Hz, Resolution=366.6314 ns, Timer=TSC
.NET Core SDK=3.0.100-preview-010184
  [Host]     : .NET Core 3.0.0-preview-27324-5 (CoreCLR 4.6.27322.0, CoreFX 4.7.19.7311), 64bit RyuJIT
  Job-SOCKON : .NET Core f01bf9f8-609f-443f-b1f9-73654bd6fdd8 (CoreCLR 4.6.27523.0, CoreFX 4.7.19.12501), 64bit RyuJIT
  Job-LPCBGO : .NET Core f01bf9f8-609f-443f-b1f9-73654bd6fdd8 (CoreCLR 4.6.27523.0, CoreFX 4.7.19.12501), 64bit RyuJIT

Runtime=Core  Toolchain=CoreRun  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1  
Namespace Type Method InvocationCount UnrollFactor Size Items Count Mean Error StdDev Median Min Max Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
System.Collections CtorDefaultSize<Int32> Dictionary 1 16 ? ? ? 21.138 ns 0.5386 ns 0.4498 ns 21.034 ns 20.602 ns 22.054 ns 0.0171 - - 72 B
System.Collections CtorDefaultSize<String> Dictionary 1 16 ? ? ? 115.775 ns 2.1777 ns 2.0370 ns 115.759 ns 112.713 ns 119.546 ns 0.0169 - - 72 B
System.Collections.Concurrent IsEmpty<Int32> Dictionary 1 16 0 ? ? 167.668 ns 1.7925 ns 1.6767 ns 167.952 ns 164.903 ns 171.290 ns - - - -
System.Collections.Concurrent IsEmpty<String> Dictionary 1 16 0 ? ? 175.245 ns 3.1680 ns 2.9634 ns 175.101 ns 170.252 ns 181.164 ns - - - -
System.Collections DictionaryMappingFunction Entropy 1 16 ? 500 ? 85,153.309 ns 967.4673 ns 904.9695 ns 85,026.696 ns 84,090.283 ns 87,115.608 ns 17.1371 0.3360 - 73160 B
System.Collections TryAddDefaultSize<Int32> Dictionary 1 16 ? ? 2048 151,853.548 ns 1,614.4616 ns 1,510.1684 ns 152,020.658 ns 149,280.358 ns 154,742.987 ns 24.5098 12.2549 - 154192 B
System.Collections TryAddDefaultSize<String> Dictionary 1 16 ? ? 2048 260,378.396 ns 3,333.4320 ns 3,118.0943 ns 260,265.627 ns 254,684.640 ns 266,279.359 ns 46.1066 40.9836 26.6393 215717 B
System.Collections TryAddGiventSize<Int32> Dictionary 1 16 ? ? 2048 62,035.232 ns 625.4011 ns 554.4018 ns 61,999.683 ns 61,072.372 ns 63,033.053 ns 7.4111 3.7055 - 46784 B
System.Collections TryAddGiventSize<String> Dictionary 1 16 ? ? 2048 147,364.413 ns 2,351.7824 ns 2,084.7938 ns 146,609.444 ns 145,433.525 ns 152,536.152 ns 10.5140 5.2570 - 65448 B
System.Collections AddDefaultSize<Int32> Dictionary 1 16 ? ? 2048 151,457.558 ns 2,898.9032 ns 2,711.6358 ns 150,197.879 ns 148,626.916 ns 156,803.193 ns 24.0385 12.0192 - 154192 B
System.Collections AddDefaultSize<String> Dictionary 1 16 ? ? 2048 258,826.523 ns 1,935.8032 ns 1,810.7515 ns 258,518.343 ns 255,039.852 ns 261,568.971 ns 46.1066 44.0574 26.6393 215722 B
System.Collections Remove<Int32> Dictionary 1000 1 2048 ? ? 62,786.129 ns 834.9818 ns 740.1896 ns 62,748.654 ns 61,766.815 ns 64,588.044 ns - - - -
System.Collections Remove<String> Dictionary 1000 1 2048 ? ? 133,691.245 ns 2,540.5734 ns 2,376.4540 ns 132,631.167 ns 130,215.066 ns 137,910.659 ns - - - -
System.Collections Clear<Int32> Dictionary 1000 1 2048 ? ? 8,307.190 ns 192.6903 ns 214.1748 ns 8,305.994 ns 7,901.233 ns 8,650.261 ns - - - -
System.Collections Clear<String> Dictionary 1000 1 2048 ? ? 10,672.075 ns 140.5965 ns 117.4045 ns 10,662.063 ns 10,496.712 ns 10,923.105 ns - - - -
System.Collections ContainsKeyFalse<Int32, Int32> Dictionary 1 16 2048 ? ? 46,885.587 ns 552.1149 ns 516.4486 ns 46,941.644 ns 45,685.077 ns 47,682.228 ns - - - -
System.Collections ContainsKeyFalse<String, String> Dictionary 1 16 2048 ? ? 109,510.313 ns 851.8671 ns 796.8370 ns 109,437.392 ns 108,226.451 ns 110,776.526 ns - - - -
System.Collections ContainsKeyTrue<Int32, Int32> Dictionary 1 16 2048 ? ? 42,012.134 ns 350.9807 ns 328.3075 ns 42,010.042 ns 41,423.956 ns 42,699.365 ns - - - -
System.Collections ContainsKeyTrue<String, String> Dictionary 1 16 2048 ? ? 109,269.743 ns 1,116.3895 ns 1,044.2715 ns 109,322.904 ns 108,074.619 ns 111,986.671 ns - - - -
System.Collections.Concurrent Count<Int32> Dictionary 1 16 2048 ? ? 21,219.886 ns 640.3695 ns 737.4505 ns 21,284.655 ns 19,958.709 ns 22,553.940 ns - - - -
System.Collections.Concurrent Count<String> Dictionary 1 16 2048 ? ? 20,212.199 ns 175.6823 ns 164.3333 ns 20,235.303 ns 19,916.917 ns 20,478.918 ns - - - -
System.Collections.Concurrent IsEmpty<Int32> Dictionary 1 16 2048 ? ? 2.983 ns 0.1841 ns 0.1722 ns 2.970 ns 2.709 ns 3.323 ns - - - -
System.Collections.Concurrent IsEmpty<String> Dictionary 1 16 2048 ? ? 3.017 ns 0.1457 ns 0.1291 ns 2.981 ns 2.854 ns 3.276 ns - - - -
System.Collections AddGivenSize<Int32> Dictionary 1 16 2048 ? ? 63,489.339 ns 1,488.1789 ns 1,713.7890 ns 63,791.385 ns 60,512.448 ns 66,618.257 ns 7.4588 3.6008 - 46784 B
System.Collections AddGivenSize<String> Dictionary 1 16 2048 ? ? 145,403.806 ns 1,692.4725 ns 1,583.1399 ns 145,222.810 ns 143,469.853 ns 149,133.757 ns 10.4167 5.2083 - 65448 B
System.Collections CtorFromCollection<Int32> Dictionary 1 16 2048 ? ? 64,083.569 ns 707.7096 ns 661.9920 ns 64,151.889 ns 63,065.820 ns 65,028.320 ns 7.4897 3.6157 - 46784 B
System.Collections CtorFromCollection<String> Dictionary 1 16 2048 ? ? 150,999.660 ns 2,020.8942 ns 1,890.3456 ns 150,848.377 ns 147,695.729 ns 155,176.823 ns 10.4167 5.2083 - 65448 B
System.Collections CtorGivenSize<Int32> Dictionary 1 16 2048 ? ? 12,027.516 ns 209.0856 ns 195.5788 ns 12,082.695 ns 11,642.689 ns 12,331.592 ns 7.4472 3.7236 - 46784 B
System.Collections CtorGivenSize<String> Dictionary 1 16 2048 ? ? 23,727.781 ns 369.4205 ns 345.5561 ns 23,688.431 ns 23,077.184 ns 24,386.122 ns 10.3855 5.1460 - 65448 B
System.Collections IndexerSet<Int32> Dictionary 1 16 2048 ? ? 50,974.114 ns 941.5698 ns 880.7450 ns 50,592.549 ns 49,743.466 ns 52,757.452 ns - - - -
System.Collections IndexerSet<String> Dictionary 1 16 2048 ? ? 124,769.958 ns 1,494.9546 ns 1,398.3814 ns 124,507.891 ns 122,417.082 ns 127,556.236 ns - - - -
System.Collections IterateForEach<Int32> Dictionary 1 16 2048 ? ? 18,215.827 ns 203.7896 ns 190.6249 ns 18,195.956 ns 17,896.050 ns 18,554.976 ns - - - -
System.Collections IterateForEach<String> Dictionary 1 16 2048 ? ? 18,497.016 ns 190.6483 ns 178.3325 ns 18,461.196 ns 18,305.749 ns 18,943.396 ns - - - -
System.Collections.Tests Perf_Dictionary ContainsValue 1 16 ? 3000 ? 7,479,647.883 ns 84,846.0839 ns 79,365.0796 ns 7,503,163.724 ns 7,335,345.834 ns 7,575,527.598 ns - - - -
System.Collections DictionaryMappingFunction Entropy 1 16 ? 5000 ? 1,047,806.953 ns 12,991.8550 ns 12,152.5892 ns 1,049,529.001 ns 1,028,935.009 ns 1,069,438.614 ns 120.8333 120.8333 120.8333 673056 B
System.Collections DictionaryMappingFunction Entropy 1 16 ? 50000 ? 8,611,516.872 ns 160,790.0849 ns 157,917.3629 ns 8,648,810.877 ns 8,290,096.415 ns 8,892,345.790 ns 1062.5000 1031.2500 1031.2500 6037793 B
System.Collections DictionaryMappingFunction Entropy 1 16 ? 500000 ? 115,474,366.049 ns 2,256,936.6814 ns 2,216,613.6005 ns 115,663,043.738 ns 111,832,845.410 ns 118,933,395.905 ns 2500.0000 2500.0000 2500.0000 53888568 B
System.Collections DictionaryMappingFunction Entropy 1 16 ? 5000000 ? 1,060,218,842.288 ns 6,433,913.8710 ns 6,018,287.0268 ns 1,058,696,955.310 ns 1,048,741,079.400 ns 1,071,752,699.780 ns 4000.0000 4000.0000 4000.0000 471722104 B

After

BenchmarkDotNet=v0.11.3.1003-nightly, OS=Windows 10.0.17134.590 (1803/April2018Update/Redstone4)
Intel Core i7 CPU 860 2.80GHz (Nehalem), 1 CPU, 8 logical and 4 physical cores
Frequency=2727535 Hz, Resolution=366.6314 ns, Timer=TSC
.NET Core SDK=3.0.100-preview-010184
  [Host]     : .NET Core 3.0.0-preview-27324-5 (CoreCLR 4.6.27322.0, CoreFX 4.7.19.7311), 64bit RyuJIT
  Job-BXGUAA : .NET Core 87f9cb2e-5075-4cbd-ab6c-b5da46079c94 (CoreCLR 4.6.27523.0, CoreFX 4.7.19.12501), 64bit RyuJIT
  Job-KJQYMZ : .NET Core 87f9cb2e-5075-4cbd-ab6c-b5da46079c94 (CoreCLR 4.6.27523.0, CoreFX 4.7.19.12501), 64bit RyuJIT

Runtime=Core  Toolchain=CoreRun  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1  
Namespace Type Method InvocationCount UnrollFactor Size Items Count Mean Error StdDev Median Min Max Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
System.Collections CtorDefaultSize<Int32> Dictionary 1 16 ? ? ? 23.805 ns 0.2385 ns 0.2231 ns 23.704 ns 23.541 ns 24.239 ns 0.0171 - - 72 B
System.Collections CtorDefaultSize<String> Dictionary 1 16 ? ? ? 113.941 ns 0.9883 ns 0.9245 ns 113.855 ns 112.458 ns 115.495 ns 0.0170 - - 72 B
System.Collections.Concurrent IsEmpty<Int32> Dictionary 1 16 0 ? ? 170.869 ns 1.9216 ns 1.7975 ns 170.418 ns 166.628 ns 173.358 ns - - - -
System.Collections.Concurrent IsEmpty<String> Dictionary 1 16 0 ? ? 179.837 ns 1.7561 ns 1.6427 ns 180.384 ns 176.214 ns 181.830 ns - - - -
System.Collections DictionaryMappingFunction Entropy 1 16 ? 500 ? 82,340.128 ns 978.2604 ns 915.0654 ns 82,026.180 ns 81,309.353 ns 83,891.009 ns 17.3429 0.3272 - 73160 B
System.Collections TryAddDefaultSize<Int32> Dictionary 1 16 ? ? 2048 150,153.371 ns 1,618.9973 ns 1,514.4110 ns 150,124.634 ns 147,893.717 ns 152,240.763 ns 24.1745 11.7925 - 154192 B
System.Collections TryAddDefaultSize<String> Dictionary 1 16 ? ? 2048 261,100.120 ns 1,934.8411 ns 1,809.8516 ns 261,193.832 ns 257,340.070 ns 264,332.737 ns 46.1066 39.9590 26.6393 215717 B
System.Collections TryAddGiventSize<Int32> Dictionary 1 16 ? ? 2048 63,706.533 ns 2,213.4287 ns 2,548.9878 ns 63,218.560 ns 60,307.853 ns 67,743.168 ns 7.2614 3.6307 - 46784 B
System.Collections TryAddGiventSize<String> Dictionary 1 16 ? ? 2048 142,825.956 ns 868.1599 ns 724.9528 ns 142,806.871 ns 141,892.376 ns 144,623.363 ns 10.2273 5.1136 - 65448 B
System.Collections AddDefaultSize<Int32> Dictionary 1 16 ? ? 2048 151,037.858 ns 2,873.2948 ns 3,074.3933 ns 150,303.520 ns 146,159.005 ns 155,900.099 ns 24.2718 12.1359 - 154192 B
System.Collections AddDefaultSize<String> Dictionary 1 16 ? ? 2048 267,313.615 ns 4,137.9096 ns 3,668.1491 ns 268,283.291 ns 259,293.248 ns 271,946.692 ns 45.5508 40.2542 26.4831 215720 B
System.Collections Remove<Int32> Dictionary 1000 1 2048 ? ? 62,601.610 ns 1,216.2933 ns 1,137.7215 ns 62,405.854 ns 60,909.264 ns 65,463.193 ns - - - -
System.Collections Remove<String> Dictionary 1000 1 2048 ? ? 128,512.905 ns 1,917.1579 ns 1,600.9137 ns 128,483.997 ns 126,116.292 ns 132,111.082 ns - - - -
System.Collections Clear<Int32> Dictionary 1000 1 2048 ? ? 8,145.743 ns 170.3186 ns 182.2390 ns 8,082.011 ns 7,897.778 ns 8,482.189 ns - - - -
System.Collections Clear<String> Dictionary 1000 1 2048 ? ? 10,767.409 ns 131.9702 ns 110.2011 ns 10,776.067 ns 10,551.322 ns 10,956.083 ns - - - -
System.Collections ContainsKeyFalse<Int32, Int32> Dictionary 1 16 2048 ? ? 44,879.415 ns 333.7097 ns 278.6627 ns 44,902.335 ns 44,395.467 ns 45,463.019 ns - - - -
System.Collections ContainsKeyFalse<String, String> Dictionary 1 16 2048 ? ? 110,064.587 ns 1,119.4218 ns 1,047.1078 ns 110,064.819 ns 107,313.029 ns 111,583.653 ns - - - -
System.Collections ContainsKeyTrue<Int32, Int32> Dictionary 1 16 2048 ? ? 41,055.418 ns 778.8534 ns 728.5399 ns 40,913.175 ns 40,131.967 ns 42,263.798 ns - - - -
System.Collections ContainsKeyTrue<String, String> Dictionary 1 16 2048 ? ? 109,705.450 ns 997.1031 ns 932.6908 ns 109,501.575 ns 108,489.520 ns 111,074.399 ns - - - -
System.Collections.Concurrent Count<Int32> Dictionary 1 16 2048 ? ? 20,438.584 ns 196.3024 ns 183.6214 ns 20,467.289 ns 20,101.198 ns 20,709.797 ns - - - -
System.Collections.Concurrent Count<String> Dictionary 1 16 2048 ? ? 20,203.623 ns 205.4737 ns 192.2002 ns 20,219.996 ns 19,945.734 ns 20,448.163 ns - - - -
System.Collections.Concurrent IsEmpty<Int32> Dictionary 1 16 2048 ? ? 2.866 ns 0.1173 ns 0.1097 ns 2.858 ns 2.676 ns 3.075 ns - - - -
System.Collections.Concurrent IsEmpty<String> Dictionary 1 16 2048 ? ? 3.092 ns 0.1544 ns 0.1445 ns 3.113 ns 2.887 ns 3.399 ns - - - -
System.Collections AddGivenSize<Int32> Dictionary 1 16 2048 ? ? 60,760.020 ns 894.4473 ns 836.6666 ns 60,721.554 ns 59,326.054 ns 62,369.815 ns 7.4234 3.5920 - 46784 B
System.Collections AddGivenSize<String> Dictionary 1 16 2048 ? ? 143,719.715 ns 1,504.3490 ns 1,407.1690 ns 143,787.977 ns 141,109.049 ns 146,574.665 ns 10.1351 5.0676 - 65448 B
System.Collections CtorFromCollection<Int32> Dictionary 1 16 2048 ? ? 62,795.952 ns 444.3428 ns 415.6385 ns 62,860.373 ns 61,932.473 ns 63,377.986 ns 7.4111 3.7055 - 46784 B
System.Collections CtorFromCollection<String> Dictionary 1 16 2048 ? ? 152,208.271 ns 1,459.9787 ns 1,365.6649 ns 152,484.915 ns 149,151.577 ns 154,605.696 ns 10.4167 5.2083 - 65448 B
System.Collections CtorGivenSize<Int32> Dictionary 1 16 2048 ? ? 11,921.263 ns 178.5901 ns 167.0533 ns 11,967.958 ns 11,558.556 ns 12,132.757 ns 7.4494 3.7013 - 46784 B
System.Collections CtorGivenSize<String> Dictionary 1 16 2048 ? ? 23,713.601 ns 199.5404 ns 186.6502 ns 23,718.992 ns 23,359.783 ns 24,099.413 ns 10.3083 5.1077 - 65448 B
System.Collections IndexerSet<Int32> Dictionary 1 16 2048 ? ? 51,615.680 ns 457.0949 ns 427.5669 ns 51,389.743 ns 51,137.684 ns 52,246.152 ns - - - -
System.Collections IndexerSet<String> Dictionary 1 16 2048 ? ? 121,926.922 ns 1,032.9373 ns 862.5494 ns 121,878.959 ns 120,175.949 ns 123,460.236 ns - - - -
System.Collections IterateForEach<Int32> Dictionary 1 16 2048 ? ? 18,384.410 ns 128.4349 ns 120.1381 ns 18,416.393 ns 18,183.731 ns 18,545.727 ns - - - -
System.Collections IterateForEach<String> Dictionary 1 16 2048 ? ? 18,341.761 ns 195.1433 ns 172.9895 ns 18,307.120 ns 18,026.805 ns 18,753.731 ns - - - -
System.Collections.Tests Perf_Dictionary ContainsValue 1 16 ? 3000 ? 7,975,044.826 ns 105,824.8983 ns 98,988.6755 ns 7,981,264.829 ns 7,840,340.882 ns 8,226,755.110 ns - - - -
System.Collections DictionaryMappingFunction Entropy 1 16 ? 5000 ? 1,031,586.157 ns 20,160.3601 ns 21,571.3594 ns 1,033,572.395 ns 989,414.793 ns 1,075,720.686 ns 121.0938 121.0938 121.0938 673056 B
System.Collections DictionaryMappingFunction Entropy 1 16 ? 50000 ? 8,016,221.116 ns 105,045.7129 ns 93,120.2884 ns 8,006,054.459 ns 7,824,377.139 ns 8,183,526.976 ns 1062.5000 1031.2500 1031.2500 6037793 B
System.Collections DictionaryMappingFunction Entropy 1 16 ? 500000 ? 112,127,287.093 ns 1,208,900.5710 ns 1,130,806.3442 ns 111,575,103.530 ns 110,162,839.340 ns 114,698,069.870 ns 2500.0000 2500.0000 2500.0000 53888568 B
System.Collections DictionaryMappingFunction Entropy 1 16 ? 5000000 ? 1,030,821,064.930 ns 8,349,924.7036 ns 7,810,524.7484 ns 1,028,937,483.850 ns 1,020,771,502.470 ns 1,045,766,965.410 ns 4000.0000 4000.0000 4000.0000 471722104 B

Comparer

Slower diff/base Base Median (ns) Diff Median (ns) Modality
System.Collections.CtorDefaultSize.Dictionary 1.13 21.03 23.70
System.Collections.Tests.Perf_Dictionary.ContainsValue(Items: 3000) 1.06 7503163.72 7981264.83
System.Collections.AddDefaultSize.Dictionary(Count: 2048) 1.04 258518.34 268283.29
Faster base/diff Base Median (ns) Diff Median (ns) Modality
System.Collections.DictionaryMappingFunction.Entropy(Items: 50000) 1.08 8648810.88 8006054.46
System.Collections.AddGivenSize.Dictionary(Size: 2048) 1.05 63791.39 60721.55
System.Collections.ContainsKeyFalse<Int32, Int32>.Dictionary(Size: 2048) 1.05 46941.64 44902.33
System.Collections.Concurrent.Count.Dictionary(Size: 2048) 1.04 21284.66 20467.29
System.Collections.DictionaryMappingFunction.Entropy(Items: 500) 1.04 85026.70 82026.18
System.Collections.Remove.Dictionary(Size: 2048) 1.03 132631.17 128484.00
System.Collections.DictionaryMappingFunction.Entropy(Items: 5000000) 1.03 1058696955.31 1028937483.85
System.Collections.TryAddGiventSize.Dictionary(Count: 2048) 1.03 146609.44 142806.87

More interesting tests is custom test for entropy dotnet/performance@master...MarcoRossignoli:newbenchdic , ctor diff seems an outlier no code changed there.
My thoughts, better entropy on mapping function, code is not so different, the difference is not so great also in case of frequent buckets collision(my test is not "perfect" I tested only chain with max 2 item for every inserted item)

I did also some tests with different mapping functions and "and" perform better than "div" as expected(I don't know if this test have been already done in past), maybe on other PR we could try to measure better inside actual dic if it makes sense.

   public class Bench
    {
        int[] _buckets;

        [Params(10, 100, 1_000, 10_000, 50_000, 100_000, 1_000_000)]
        public int Size;

        [GlobalSetup]
        public void Setup()
        {
            _buckets = new int[Size];
        }


        [Benchmark(Baseline = true)]
        public ref int RemainderOp()
        {
            uint hashCode = (uint)Guid.NewGuid().GetHashCode();
            return ref _buckets[hashCode % _buckets.Length];
        }

        [Benchmark]
        public ref int AndOp()
        {
            uint hashCode = (uint)Guid.NewGuid().GetHashCode();
            return ref _buckets[hashCode & (uint)(_buckets.Length - 1)];
        }
    }
BenchmarkDotNet=v0.11.4, OS=Windows 10.0.17134.590 (1803/April2018Update/Redstone4)
Intel Core i7-3740QM CPU 2.70GHz (Ivy Bridge), 1 CPU, 8 logical and 4 physical cores
Frequency=2628192 Hz, Resolution=380.4897 ns, Timer=TSC
.NET Core SDK=3.0.100-preview-010184
  [Host]     : .NET Core 3.0.0-preview-27324-5 (CoreCLR 4.6.27322.0, CoreFX 4.7.19.7311), 64bit RyuJIT
  DefaultJob : .NET Core 3.0.0-preview-27324-5 (CoreCLR 4.6.27322.0, CoreFX 4.7.19.7311), 64bit RyuJIT

Method Size Mean Error StdDev Ratio RatioSD
RemainderOp 10 108.73 ns 0.2949 ns 0.2759 ns 1.00 0.00
AndOp 10 96.15 ns 0.2909 ns 0.2721 ns 0.88 0.00
RemainderOp 100 108.29 ns 0.1940 ns 0.1720 ns 1.00 0.00
AndOp 100 95.78 ns 0.3034 ns 0.2689 ns 0.88 0.00
RemainderOp 1000 108.89 ns 1.0818 ns 1.0120 ns 1.00 0.00
AndOp 1000 96.96 ns 1.4925 ns 1.3231 ns 0.89 0.02
RemainderOp 10000 109.44 ns 0.2673 ns 0.2369 ns 1.00 0.00
AndOp 10000 97.62 ns 1.9298 ns 1.8051 ns 0.89 0.02
RemainderOp 50000 110.83 ns 0.2706 ns 0.2532 ns 1.00 0.00
AndOp 50000 97.67 ns 1.9822 ns 1.8542 ns 0.88 0.02
RemainderOp 100000 113.69 ns 0.7799 ns 0.6914 ns 1.00 0.00
AndOp 100000 96.78 ns 0.3589 ns 0.3357 ns 0.85 0.01
RemainderOp 1000000 158.42 ns 3.1855 ns 6.2879 ns 1.00 0.00
AndOp 1000000 100.77 ns 0.5143 ns 0.4295 ns 0.66 0.02

I did also work on last point of list

  • Eliminate the _freeCount field and use _freeList == -1 as a sentinel instead`

and the result are not so great, we remove the local var freeCount but after that we use count as a "real item count" and it's no more aligned with entries count. This lead to change every piece of code that iterate throught entries using a local count = _count (to decrement after every "valid item" found, next > -1) . Another issue is that DictionarySlim doesn't support "versioning" so I had to change also every enumerator to support current dictionary behaviour(support remove during enumeration). We need also to change code for features like Trim(), EnsureCapacity() that slim one doesn't have.
I'll show result if you want, but the complexity of change it's not worth to me.
Some preview code(not optimized but to understand the complexity) https://github.com/MarcoRossignoli/marcorossignoli.github.io/blob/dicbackporting/src/DicBackportingBenchmark/Dic/Dic/Dictionary.cs#L107 https://github.com/MarcoRossignoli/marcorossignoli.github.io/blob/dicbackporting/src/DicBackportingBenchmark/Dic/Dic/Dictionary.cs#L1610

Thank's for chance to explore dictionary so deeply, very interesting and funny!
I hope I was helpful.

/cc @danmosemsft

@MarcoRossignoli
Copy link
Member Author

/azp run

@MarcoRossignoli
Copy link
Member Author

@dotnet-bot test this please

@MarcoRossignoli
Copy link
Member Author

@safern should new command works also here?

@safern
Copy link
Member

safern commented Feb 27, 2019

@safern should new command works also here?

Yeah, it should work. Weird that it didn't work for you. Let me give it a try.

@safern
Copy link
Member

safern commented Feb 27, 2019

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@safern
Copy link
Member

safern commented Feb 27, 2019

Damn, it seems like coreclr now has outerloop builds already.

@MarcoRossignoli
Copy link
Member Author

MarcoRossignoli commented Mar 1, 2019

/cc @safern this PR is in your area, I don't know who have to review

@danmoseley
Copy link
Member

@MarcoRossignoli thanks for looking at this. I am trying to absorb the information above.

When gathering perf data, it is ideal to include the before and after in the same Benchmark.NET run so that it can show before and after in the same table, and calculate the base/candidate ratio (I see at least one of your tables has this) and also to be super explicit about what commit is base and what commit is candidate. That way it's easier for the rest of us to keep up 😃

Of the various changes suggested in the issue, did any produce a clear perf improvement, and no significant regressions? Can we limit this PR to just that/those, and show the before/after perf data for just that/those?

@MarcoRossignoli
Copy link
Member Author

When gathering perf data, it is ideal to include the before and after in the same Benchmark.NET run so that it can show before and after in the same table, and calculate the base/candidate ratio (I see at least one of your tables has this) and also to be super explicit about what commit is base and what commit is candidate. That way it's easier for the rest of us to keep up 😃

I used new comparer to compare before/after, because AFAIK it's not possible merge 2 different benchmark run result togheter(@adamsitnik could you confirm?). Dictionary is on coreclr so to do tests I need to compile my local coreclr+corefx and after use performance repo tests on my local build. But I cannot run and compare old and new version togheter, so my strategy was run old code(coreclr+corefx with no dictionary update) and after new updated code(updated dic on coreclr+corefx) and compare with comparer tool. BTW now I'll try to do comparison on same report cloning 4 repo, current coreclr+corefx vs coreclr(dic updated)+corefx using two CoreRun.exe, I think it's the only way compare in a correct way what do you think @adamsitnik?Is there better way or a benchmarkdotnet feature to merge two "cold" result?

Of the various changes suggested in the issue, did any produce a clear perf improvement, and no significant regressions? Can we limit this PR to just that/those, and show the before/after perf data for just that/those?

The issue asked to test 2 remaining point:

  1. Use uint hash code for better entropy -> with uint hashcode we've slighly better performance in case of buckets collision, I added new tests to performance repo where I simulate int vs uint hash collision dotnet/performance@master...MarcoRossignoli:newbenchdic
Slower diff/base Base Median (ns) Diff Median (ns) Modality
System.Collections.CtorDefaultSize.Dictionary 1.13 21.03 23.70
System.Collections.Tests.Perf_Dictionary.ContainsValue(Items: 3000) 1.06 7503163.72 7981264.83
System.Collections.AddDefaultSize.Dictionary(Count: 2048) 1.04 258518.34 268283.29
Faster base/diff Base Median (ns) Diff Median (ns) Modality
System.Collections.DictionaryMappingFunction.Entropy(Items: 50000) 1.08 8648810.88 8006054.46
System.Collections.AddGivenSize.Dictionary(Size: 2048) 1.05 63791.39 60721.55
System.Collections.ContainsKeyFalse<Int32, Int32>.Dictionary(Size: 2048) 1.05 46941.64 44902.33
System.Collections.Concurrent.Count.Dictionary(Size: 2048) 1.04 21284.66 20467.29
System.Collections.DictionaryMappingFunction.Entropy(Items: 500) 1.04 85026.70 82026.18
System.Collections.Remove.Dictionary(Size: 2048) 1.03 132631.17 128484.00
System.Collections.DictionaryMappingFunction.Entropy(Items: 5000000) 1.03 1058696955.31 1028937483.85
System.Collections.TryAddGiventSize.Dictionary(Count: 2048) 1.03 146609.44 142806.87

Check new DictionaryMappingFunction. Clearly the difference cannot be "so great" we are talking about few line of code(with few change in emitted code), improve entropy lead to "less" chain loop in case of collision, and this is more valuable in deep chain, my collision test has got max 2 element per collision, other perf remain basically unchanged(I did more than one test and every time slower results tests changed, the only thing that remain stable better/equal is new mapping function tests).
Collision test core code

for (int i = 1; i < Items; i++)
{
 dict.Add(i, i);
 dict.Add(int.MinValue + i, int.MinValue + i);
}
  1. Eliminate the _freeCount field and use _freeList == -1 as a sentinel instead` -> As explained above I think that this change not worth it because if we remove _freecount we need to use _count as a real "non free list" item count, this lead to change every piece of code that "enumerate" the _entries(we need to use a local counter to decrement). After some preliminary tests I found that the code perform as expected worste, due to more complexity of scans,resize(trim/ensure capacity) and enumeration. This alg works better on DictionarySlim because it has less features for instance trim, ensure capacity, more enumerators type with different behaviours, contructors and features like remove item during enumeration.

Finally I think that uint as hashcode could be an improvement in case of collision with not measurable regression, eliminate _freeCount remove one local var but regress all other features due to more complexity on _entry enumeration(after this PR I'll show results on this second one if needed)

This PR show result only of one reasonable improvement "Use uint hash code for better entropy".

/cc @danmosemsft

@MarcoRossignoli
Copy link
Member Author

rebased for better comparison with coreclr+corefx upstream

@danmoseley
Copy link
Member

Thanks @MarcoRossignoli that makes things clearer. For your perf results immediately above, is it possible to include the error? For example I am not sure how CtorDefaultSize could possibly change, but perhaps that 13% is within the error since it is a very short time. Likewise I wonder how the improvements compare to the noise level.

@MarcoRossignoli
Copy link
Member Author

@danmosemsft I merged results(run updated corefx+coreclr vs corefxupstream+coreclrupstream), slightly better number on custom collision tests DictionaryMappingFunction stable others.

BenchmarkDotNet=v0.11.3.1003-nightly, OS=Windows 10.0.17134.590 (1803/April2018Update/Redstone4)
Intel Core i7 CPU 860 2.80GHz (Nehalem), 1 CPU, 8 logical and 4 physical cores
Frequency=2727538 Hz, Resolution=366.6310 ns, Timer=TSC
.NET Core SDK=3.0.100-preview3-010431
  [Host]     : .NET Core 3.0.0-preview3-27503-5 (CoreCLR 4.6.27422.72, CoreFX 4.7.19.12807), 64bit RyuJIT
  Job-TYTTNG : .NET Core fb2d4c2d-5f0e-4d7c-bb86-1e82a8054ebf (CoreCLR 4.6.27603.0, CoreFX 4.7.19.15701), 64bit RyuJIT
  Job-HUJGQG : .NET Core fc4c2829-86c1-4f64-abed-1da2cd29ddb8 (CoreCLR 4.6.27602.0, CoreFX 4.7.19.15501), 64bit RyuJIT
  Job-TLMUNW : .NET Core fb2d4c2d-5f0e-4d7c-bb86-1e82a8054ebf (CoreCLR 4.6.27603.0, CoreFX 4.7.19.15701), 64bit RyuJIT
  Job-CFGYCY : .NET Core fc4c2829-86c1-4f64-abed-1da2cd29ddb8 (CoreCLR 4.6.27602.0, CoreFX 4.7.19.15501), 64bit RyuJIT

Runtime=Core  IterationTime=250.0000 ms  MaxIterationCount=20  
MinIterationCount=15  WarmupCount=1  
Namespace Type Method Toolchain InvocationCount UnrollFactor Size Items Count Mean Error StdDev Median Min Max Ratio RatioSD Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
System.Collections CtorDefaultSize<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? ? 15.469 ns 0.1771 ns 0.1570 ns 15.507 ns 15.175 ns 15.777 ns 1.02 0.02 0.0172 - - 72 B
System.Collections CtorDefaultSize<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? ? 15.236 ns 0.1964 ns 0.1837 ns 15.178 ns 15.027 ns 15.628 ns 1.00 0.00 0.0172 - - 72 B
System.Collections CtorDefaultSize<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? ? 113.593 ns 1.3262 ns 1.2406 ns 113.965 ns 111.339 ns 115.190 ns 0.99 0.01 0.0168 - - 72 B
System.Collections CtorDefaultSize<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? ? 114.447 ns 1.0736 ns 1.0042 ns 114.740 ns 112.303 ns 115.707 ns 1.00 0.00 0.0168 - - 72 B
System.Collections.Concurrent IsEmpty<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 0 ? ? 166.250 ns 3.1452 ns 2.7881 ns 166.064 ns 160.717 ns 171.458 ns 0.97 0.02 - - - -
System.Collections.Concurrent IsEmpty<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 0 ? ? 171.162 ns 3.5458 ns 3.6413 ns 171.746 ns 165.077 ns 177.490 ns 1.00 0.00 - - - -
System.Collections.Concurrent IsEmpty<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 0 ? ? 167.295 ns 3.0576 ns 2.8601 ns 166.713 ns 163.521 ns 174.919 ns 0.92 0.02 - - - -
System.Collections.Concurrent IsEmpty<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 0 ? ? 181.145 ns 3.5903 ns 3.9906 ns 179.651 ns 175.399 ns 187.824 ns 1.00 0.00 - - - -
System.Collections DictionaryMappingFunction Entropy \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 500 ? 78,633.831 ns 783.7646 ns 654.4789 ns 78,699.529 ns 77,625.160 ns 80,096.175 ns 0.96 0.02 17.3267 0.3094 - 73160 B
System.Collections DictionaryMappingFunction Entropy \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 500 ? 81,562.640 ns 1,197.4547 ns 1,061.5124 ns 81,467.717 ns 79,652.430 ns 84,004.393 ns 1.00 0.00 17.1632 0.3238 - 73160 B
System.Collections TryAddDefaultSize<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 144,090.603 ns 2,839.2968 ns 2,516.9627 ns 144,459.501 ns 138,964.451 ns 148,206.117 ns 1.00 0.01 24.0826 12.0413 - 154192 B
System.Collections TryAddDefaultSize<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 143,787.592 ns 1,173.7740 ns 1,040.5201 ns 143,776.224 ns 142,292.201 ns 146,195.572 ns 1.00 0.00 24.4318 11.9318 - 154192 B
System.Collections TryAddDefaultSize<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 256,188.662 ns 4,963.0609 ns 4,642.4503 ns 255,995.780 ns 249,044.065 ns 266,255.438 ns 1.01 0.03 45.0820 43.0328 26.6393 215711 B
System.Collections TryAddDefaultSize<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 254,422.585 ns 4,494.4940 ns 4,204.1525 ns 254,652.380 ns 248,441.065 ns 264,037.533 ns 1.00 0.00 46.1066 43.0328 26.6393 215717 B
System.Collections TryAddGiventSize<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 59,885.572 ns 783.4602 ns 732.8491 ns 60,027.418 ns 58,389.965 ns 60,872.332 ns 1.01 0.02 7.4807 3.6197 - 46784 B
System.Collections TryAddGiventSize<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 59,551.804 ns 1,012.4980 ns 897.5531 ns 59,562.665 ns 57,983.861 ns 60,732.035 ns 1.00 0.00 7.5000 3.7500 - 46784 B
System.Collections TryAddGiventSize<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 147,372.140 ns 1,861.0225 ns 1,649.7480 ns 147,859.102 ns 142,796.178 ns 149,017.872 ns 0.98 0.03 10.4167 5.2083 - 65448 B
System.Collections TryAddGiventSize<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 151,060.351 ns 3,046.5887 ns 3,508.4561 ns 149,798.612 ns 145,929.819 ns 158,106.809 ns 1.00 0.00 10.5140 5.2570 - 65448 B
System.Collections AddDefaultSize<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 140,402.959 ns 1,672.7776 ns 1,564.7172 ns 140,453.479 ns 137,549.632 ns 143,700.840 ns 1.01 0.02 24.3363 12.1681 - 154192 B
System.Collections AddDefaultSize<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 138,913.527 ns 1,379.6098 ns 1,290.4879 ns 138,680.936 ns 136,637.495 ns 140,771.016 ns 1.00 0.00 24.3363 12.1681 - 154192 B
System.Collections AddDefaultSize<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 250,561.131 ns 3,507.1231 ns 3,280.5651 ns 249,870.881 ns 245,252.190 ns 256,316.641 ns 1.00 0.01 46.8750 40.0391 27.3438 215711 B
System.Collections AddDefaultSize<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 250,038.418 ns 4,026.6276 ns 3,766.5099 ns 251,307.635 ns 239,967.898 ns 254,558.575 ns 1.00 0.00 45.6349 41.6667 26.7857 215712 B
System.Collections Remove<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 2048 ? ? 63,597.686 ns 2,709.0932 ns 2,898.6994 ns 62,663.307 ns 59,872.328 ns 71,698.744 ns 0.98 0.06 - - - -
System.Collections Remove<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 2048 ? ? 64,856.803 ns 2,415.7189 ns 2,584.7922 ns 64,403.851 ns 60,998.948 ns 71,222.087 ns 1.00 0.00 - - - -
System.Collections Remove<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 2048 ? ? 132,072.532 ns 2,567.5553 ns 2,521.6826 ns 131,477.765 ns 128,414.196 ns 137,590.237 ns 0.97 0.03 - - - -
System.Collections Remove<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 2048 ? ? 135,410.691 ns 2,587.7128 ns 2,876.2359 ns 134,324.691 ns 132,140.304 ns 142,955.918 ns 1.00 0.00 - - - -
System.Collections Clear<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 2048 ? ? 8,081.075 ns 160.1735 ns 164.4864 ns 8,159.685 ns 7,811.019 ns 8,290.205 ns 1.00 0.04 - - - -
System.Collections Clear<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 2048 ? ? 8,086.285 ns 211.0632 ns 243.0607 ns 8,135.029 ns 7,678.390 ns 8,628.331 ns 1.00 0.00 - - - -
System.Collections Clear<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 2048 ? ? 10,452.264 ns 160.1511 ns 133.7334 ns 10,458.186 ns 10,296.502 ns 10,795.487 ns 1.00 0.02 - - - -
System.Collections Clear<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 2048 ? ? 10,409.426 ns 111.2601 ns 92.9073 ns 10,376.006 ns 10,296.813 ns 10,529.624 ns 1.00 0.00 - - - -
System.Collections ContainsKeyFalse<Int32, Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 46,646.230 ns 404.8760 ns 378.7212 ns 46,687.896 ns 46,076.375 ns 47,268.797 ns 0.97 0.01 - - - -
System.Collections ContainsKeyFalse<Int32, Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 47,955.569 ns 660.6692 ns 617.9904 ns 48,025.901 ns 46,082.616 ns 48,726.942 ns 1.00 0.00 - - - -
System.Collections ContainsKeyFalse<String, String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 111,352.590 ns 753.3358 ns 704.6708 ns 111,380.216 ns 110,107.174 ns 112,521.420 ns 0.97 0.01 - - - -
System.Collections ContainsKeyFalse<String, String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 114,673.213 ns 892.2772 ns 834.6367 ns 114,927.569 ns 112,950.828 ns 115,617.805 ns 1.00 0.00 - - - -
System.Collections ContainsKeyTrue<Int32, Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 42,499.914 ns 351.7626 ns 329.0390 ns 42,560.015 ns 41,622.030 ns 42,870.647 ns 0.99 0.01 - - - -
System.Collections ContainsKeyTrue<Int32, Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 42,751.878 ns 284.2717 ns 265.9079 ns 42,813.099 ns 42,335.134 ns 43,118.210 ns 1.00 0.00 - - - -
System.Collections ContainsKeyTrue<String, String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 111,577.095 ns 605.8881 ns 566.7481 ns 111,722.722 ns 110,609.571 ns 112,355.651 ns 1.00 0.01 - - - -
System.Collections ContainsKeyTrue<String, String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 111,909.881 ns 727.9745 ns 680.9478 ns 111,877.097 ns 110,929.857 ns 113,018.038 ns 1.00 0.00 - - - -
System.Collections.Concurrent Count<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 21,263.792 ns 302.1740 ns 267.8693 ns 21,285.782 ns 20,914.298 ns 21,755.068 ns 1.00 0.02 - - - -
System.Collections.Concurrent Count<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 21,204.330 ns 386.7734 ns 361.7881 ns 21,231.000 ns 20,679.759 ns 22,011.259 ns 1.00 0.00 - - - -
System.Collections.Concurrent Count<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 21,049.033 ns 415.5447 ns 408.1205 ns 20,964.342 ns 20,499.649 ns 21,790.967 ns 0.99 0.02 - - - -
System.Collections.Concurrent Count<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 21,263.977 ns 341.5256 ns 319.4632 ns 21,159.107 ns 20,843.081 ns 21,745.392 ns 1.00 0.00 - - - -
System.Collections.Concurrent IsEmpty<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 3.444 ns 0.1666 ns 0.1558 ns 3.395 ns 3.208 ns 3.672 ns 1.00 0.07 - - - -
System.Collections.Concurrent IsEmpty<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 3.444 ns 0.1292 ns 0.1145 ns 3.453 ns 3.267 ns 3.742 ns 1.00 0.00 - - - -
System.Collections.Concurrent IsEmpty<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 3.182 ns 0.1693 ns 0.1500 ns 3.203 ns 2.919 ns 3.355 ns 1.04 0.10 - - - -
System.Collections.Concurrent IsEmpty<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 3.070 ns 0.1839 ns 0.1806 ns 3.024 ns 2.854 ns 3.546 ns 1.00 0.00 - - - -
System.Collections AddGivenSize<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 57,315.742 ns 457.9710 ns 428.3864 ns 57,426.870 ns 56,523.291 ns 57,811.999 ns 0.98 0.01 7.5000 3.6364 - 46784 B
System.Collections AddGivenSize<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 58,201.118 ns 718.2223 ns 671.8255 ns 58,133.455 ns 57,091.141 ns 59,327.188 ns 1.00 0.00 7.3260 3.6630 - 46784 B
System.Collections AddGivenSize<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 141,463.109 ns 809.1964 ns 717.3315 ns 141,752.047 ns 140,070.313 ns 142,426.577 ns 0.99 0.01 10.1351 5.0676 - 65448 B
System.Collections AddGivenSize<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 143,266.970 ns 1,233.7083 ns 1,154.0115 ns 142,947.502 ns 141,953.511 ns 145,557.271 ns 1.00 0.00 10.1351 5.0676 - 65448 B
System.Collections CtorFromCollection<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 59,257.161 ns 613.5756 ns 573.9390 ns 59,327.244 ns 58,292.144 ns 60,116.432 ns 0.99 0.02 7.4234 3.5920 - 46784 B
System.Collections CtorFromCollection<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 59,568.339 ns 656.3320 ns 613.9333 ns 59,503.858 ns 58,497.224 ns 60,990.854 ns 1.00 0.00 7.3529 3.6765 - 46784 B
System.Collections CtorFromCollection<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 158,584.499 ns 5,347.6117 ns 6,158.3176 ns 158,948.767 ns 144,112.146 ns 168,017.633 ns 0.97 0.05 10.4167 5.2083 - 65448 B
System.Collections CtorFromCollection<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 162,858.966 ns 4,700.2026 ns 5,412.7603 ns 162,854.079 ns 155,038.151 ns 173,701.493 ns 1.00 0.00 10.2041 5.1020 - 65448 B
System.Collections CtorGivenSize<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 9,442.885 ns 240.5037 ns 276.9644 ns 9,363.011 ns 9,077.151 ns 9,980.991 ns 1.03 0.05 7.4423 3.7211 - 46784 B
System.Collections CtorGivenSize<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 9,197.233 ns 311.2537 ns 333.0380 ns 9,068.378 ns 8,794.972 ns 9,831.122 ns 1.00 0.00 7.4216 3.6920 - 46784 B
System.Collections CtorGivenSize<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 23,334.653 ns 1,119.5814 ns 1,244.4117 ns 23,411.797 ns 20,390.587 ns 25,292.892 ns 1.13 0.04 10.3852 5.1926 - 65448 B
System.Collections CtorGivenSize<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 21,087.929 ns 296.9417 ns 277.7594 ns 20,958.756 ns 20,714.729 ns 21,544.015 ns 1.00 0.00 10.3442 5.1287 - 65448 B
System.Collections IndexerSet<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 53,583.117 ns 563.9286 ns 499.9080 ns 53,554.212 ns 52,609.088 ns 54,354.057 ns 1.01 0.02 - - - -
System.Collections IndexerSet<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 52,993.158 ns 1,012.6252 ns 947.2102 ns 52,941.966 ns 51,673.924 ns 54,931.731 ns 1.00 0.00 - - - -
System.Collections IndexerSet<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 128,099.674 ns 1,795.1925 ns 1,679.2242 ns 128,314.607 ns 124,520.201 ns 131,226.243 ns 0.98 0.02 - - - -
System.Collections IndexerSet<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 130,219.073 ns 2,456.4067 ns 2,412.5199 ns 130,414.114 ns 124,653.648 ns 134,067.569 ns 1.00 0.00 - - - -
System.Collections IterateForEach<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 19,010.259 ns 182.5996 ns 161.8698 ns 19,037.505 ns 18,734.562 ns 19,316.696 ns 1.00 0.01 - - - -
System.Collections IterateForEach<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 18,982.399 ns 233.8739 ns 218.7658 ns 18,957.186 ns 18,632.944 ns 19,329.424 ns 1.00 0.00 - - - -
System.Collections IterateForEach<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 18,993.542 ns 185.1659 ns 164.1448 ns 18,900.329 ns 18,757.894 ns 19,343.771 ns 1.00 0.01 - - - -
System.Collections IterateForEach<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 18,993.011 ns 254.5303 ns 238.0878 ns 18,957.703 ns 18,648.757 ns 19,520.744 ns 1.00 0.00 - - - -
System.Collections.Tests Perf_Dictionary ContainsValue \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 3000 ? 7,793,751.286 ns 150,023.0117 ns 166,750.1800 ns 7,766,714.662 ns 7,570,738.932 ns 8,179,896.348 ns 0.99 0.03 - - - -
System.Collections.Tests Perf_Dictionary ContainsValue \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 3000 ? 7,885,522.585 ns 148,078.6469 ns 138,512.8582 ns 7,888,299.815 ns 7,650,803.123 ns 8,184,285.608 ns 1.00 0.00 - - - -
System.Collections DictionaryMappingFunction Entropy \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 5000 ? 1,001,646.765 ns 11,249.5576 ns 10,522.8432 ns 1,002,272.765 ns 980,410.755 ns 1,023,679.760 ns 0.96 0.02 120.5357 120.5357 120.5357 673056 B
System.Collections DictionaryMappingFunction Entropy \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 5000 ? 1,038,366.147 ns 17,856.0630 ns 16,702.5724 ns 1,037,096.993 ns 1,016,098.202 ns 1,078,549.211 ns 1.00 0.00 120.8333 120.8333 120.8333 673056 B
System.Collections DictionaryMappingFunction Entropy \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 50000 ? 7,975,572.708 ns 170,000.3759 ns 188,954.9674 ns 8,035,703.811 ns 7,583,189.492 ns 8,206,851.747 ns 0.93 0.03 1062.5000 1031.2500 1031.2500 6037793 B
System.Collections DictionaryMappingFunction Entropy \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 50000 ? 8,554,088.354 ns 116,312.9926 ns 108,799.2455 ns 8,541,882.597 ns 8,287,543.794 ns 8,719,343.461 ns 1.00 0.00 1062.5000 1031.2500 1031.2500 6037793 B
System.Collections DictionaryMappingFunction Entropy \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 500000 ? 113,719,955.505 ns 1,239,408.7308 ns 1,098,703.5570 ns 113,885,397.748 ns 112,338,489.875 ns 115,917,908.385 ns 1.00 0.02 2500.0000 2500.0000 2500.0000 53888568 B
System.Collections DictionaryMappingFunction Entropy \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 500000 ? 114,081,427.490 ns 2,700,901.8392 ns 2,394,279.1302 ns 113,599,975.510 ns 110,977,738.900 ns 119,944,433.405 ns 1.00 0.00 2500.0000 2500.0000 2500.0000 53888568 B
System.Collections DictionaryMappingFunction Entropy \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 5000000 ? 1,067,144,753.498 ns 9,190,176.0946 ns 8,596,496.4209 ns 1,065,372,874.730 ns 1,053,603,652.820 ns 1,082,765,116.380 ns 0.99 0.01 4000.0000 4000.0000 4000.0000 471722104 B
System.Collections DictionaryMappingFunction Entropy \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 5000000 ? 1,077,325,852.105 ns 9,727,047.2551 ns 9,098,686.0375 ns 1,074,419,494.800 ns 1,060,670,832.080 ns 1,097,533,379.920 ns 1.00 0.00 4000.0000 4000.0000 4000.0000 471722104 B

High error on CtorGivenSize<String> so retried alone and it's ok

BenchmarkDotNet=v0.11.3.1003-nightly, OS=Windows 10.0.17134.590 (1803/April2018Update/Redstone4)
Intel Core i7 CPU 860 2.80GHz (Nehalem), 1 CPU, 8 logical and 4 physical cores
Frequency=2727538 Hz, Resolution=366.6310 ns, Timer=TSC
.NET Core SDK=3.0.100-preview3-010431
  [Host]     : .NET Core 3.0.0-preview3-27503-5 (CoreCLR 4.6.27422.72, CoreFX 4.7.19.12807), 64bit RyuJIT
  Job-YLVJAB : .NET Core 7b2a2d6f-d4e3-42b1-9a16-d56d311cd3a3 (CoreCLR 4.6.27603.0, CoreFX 4.7.19.15701), 64bit RyuJIT
  Job-SMKTJS : .NET Core cdce2a1f-95b7-448d-8f4b-eff5d026825c (CoreCLR 4.6.27602.0, CoreFX 4.7.19.15501), 64bit RyuJIT

Runtime=Core  IterationTime=250.0000 ms  MaxIterationCount=20  
MinIterationCount=15  WarmupCount=1  
Type Method Toolchain Size Mean Error StdDev Median Min Max Ratio Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
CtorGivenSize<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 2048 8.065 us 0.0834 us 0.0780 us 8.022 us 7.983 us 8.190 us 1.00 7.4405 3.7044 - 45.69 KB
CtorGivenSize<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 2048 8.090 us 0.0861 us 0.0806 us 8.109 us 7.953 us 8.192 us 1.00 7.4169 3.7084 - 45.69 KB
CtorGivenSize<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 2048 19.595 us 0.2736 us 0.2559 us 19.626 us 18.918 us 19.970 us 1.00 10.3395 5.1698 - 63.91 KB
CtorGivenSize<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 2048 19.625 us 0.1635 us 0.1529 us 19.634 us 19.351 us 19.843 us 1.00 10.3383 5.1692 - 63.91 KB

FYI @jkotas during perf tests I saw a great decrease of perf difference (~20%) on simple Contains test.
I went deep on it and found that the issue is a difference on struct layout a7a7e18

I did some test with dump and there is a difference in emitted code that lead to more "dereferencing":
uint after

    public class D2<K, V>
    {
        private S[] _entries;

        public struct S
        {
            public int next;
            public uint hashcode;
            public K key;
            public V val;
        }

        public void Test()
        {
            for (int i = 0; i < _entries.Length; i++)
            {
                if (_entries[i].next > -1) Console.WriteLine(_entries[i].next);
            }
        }
    }
...
; Tier-1 compilation
...
G_M54470_IG03:
       488B4E08             mov      rcx, gword ptr [rsi+8]
       3B7908               cmp      edi, dword ptr [rcx+8]
       7326                 jae      SHORT G_M54470_IG06
       4863C7               movsxd   rax, edi
       48C1E004             shl      rax, 4
       8B4C0110             mov      ecx, dword ptr [rcx+rax+16]
       85C9                 test     ecx, ecx
       7C05                 jl       SHORT G_M54470_IG04
       E866FEFFFF           call     Console:WriteLine(int)

uint before

    public class D<K, V>
    {
        private S[] _entries;

        public struct S
        {
            public uint hashcode;
            public int next;
            public K key;
            public V val;
        }

        public void Test()
        {
            for (int i = 0; i < _entries.Length; i++)
            {
                if (_entries[i].next > -1) Console.WriteLine(_entries[i].next);
            }
        }
    }
...
; Tier-1 compilation
...
G_M340_IG03:
       488B4E08             mov      rcx, gword ptr [rsi+8]		
       3B7908               cmp      edi, dword ptr [rcx+8]
       732C                 jae      SHORT G_M340_IG06
       4863C7               movsxd   rax, edi
       48C1E004             shl      rax, 4
       488D4C0110           lea      rcx, bword ptr [rcx+rax+16]
       83790400             cmp      dword ptr [rcx+4], 0
       7C08                 jl       SHORT G_M340_IG04
       8B4904               mov      ecx, dword ptr [rcx+4]
       E800FFFFFF           call     Console:WriteLine(int)

I'm not so fluent on codegen yet so maybe is expected, btw better to ask to remove my doubt.

@jkotas
Copy link
Member

jkotas commented Mar 7, 2019

found that the issue is a difference on struct layout

In this particular case, the JIT could have optimized this into same code in both cases.

However, accessing fields at offset zero (vs. non-zero offset) tends to generate tiny bit better code on x86/x64 when everything else is equal. The improvement will be non-measurable in most cases, but there are rare cases where the improvement gets amplified due to processor micro-architecture and you can get measurable improvement like in this case. Improvements like these tend to come and go with unrelated changes. You need to take them with a grain of salt.

@MarcoRossignoli
Copy link
Member Author

MarcoRossignoli commented Mar 7, 2019

Actually before we compared hashcode, it was at offset 0, maybe this "improvement" was in place also before, or it's simply "unrelated" to past choice as you said

@MarcoRossignoli
Copy link
Member Author

@safern @ViktorHofer is there something I can do for failing CI here?

@adamsitnik
Copy link
Member

think it's the only way compare in a correct way what do you think @adamsitnik

My personal workflow is following:

  1. clean coreclr/fx build in release
  2. create a copy of the folder with core run (I always call it "before")
  3. apply some code changes
  4. rebuild coreclr/fx

Now depending on how many benchmarks I want to run:

  • if only a few, then I provide the path to coreruns via --corerun argument to BenchmarkDotNet. The first one is marked as baseline in the results. I provide the threshold via --statisticalTest $value argument. BDN adds new column to the output and says what was faster/slower
  • if more than few, I run the benchmarks once with the "before" corerun and tell BDN to store the results in a dedicated folder. I do the same for the "after" runs and use ResultsComparer to compare results from two different folders.

If some of the nano-benchmarks seems to be unstable, I run them affinitized to one CPU. Example: --affinity 8 is going to be translated into mask 1000 which means that the benchmarks are going to be executed on 4th CPU.

@MarcoRossignoli
Copy link
Member Author

@adamsitnik thank's for infos!

@danmoseley
Copy link
Member

The CI logs are gone for some reason.
@dotnet-bot test this please

@danmoseley
Copy link
Member

Actually before we compared hashcode, it was at offset 0, maybe this "improvement" was in place also before, or it's simply "unrelated" to past choice as you said

Was there a reason to change? You could try using the original order (does not sound important, so entirely up to you). I suppose hashcode is accessed slightly more than next as next is not consulted if there is no chain and the first entry is a hit.

@MarcoRossignoli
Copy link
Member Author

Was there a reason to change? You could try using the original order (does not sound important, so entirely up to you). I suppose hashcode is accessed slightly more than next as next is not consulted if there is no chain and the first entry is a hit.

There is a perf decrease of more or less 20% on Contains test, you can read explanation on this comment #22832 (comment) after grid of results. If it's not a concern to you I'll restore old layout.

@MarcoRossignoli
Copy link
Member Author

@danmosemsft I try to explain better, during performance tests I found that the offset of fields in Entry struct change emitted code.
Before this change the field hashcode at index 0 was used on enumerations with > 0 predicate.
Now we use field next, I found difference in perf on one specific test Contains the other are very close to test with next at index 1.
With next at index 0 perf are ok(check results above).

I suppose hashcode is accessed slightly more than next as next is not consulted if there is no chain and the first entry is a hit.

Now we could revert order and go on, one doubt is that I cannot see difference in perf with hashcode at index 1(current perf result).

I could try to dump Add/Remove method and check emitted code to understand.

What are your thought? /cc @jkotas @stephentoub

If you agree with revert, I'll do.

@jkotas
Copy link
Member

jkotas commented Mar 12, 2019

What are your thought?

I do not think it matters a whole lot for the reasons #22832 (comment)

Ideally, we would fix the JIT to not generate the extra instruction that seems to be impacting the Contains micro-benchmark measurably.

@danmoseley
Copy link
Member

OK @MarcoRossignoli what you did makes sense to me. Sounds like you might consider opening a CoreCLR issue for the JIT. They will want a small repro if possible. Certainly no need to wait on that.

@MarcoRossignoli
Copy link
Member Author

Ok I'll revert order and open issue on CoreCLR with above sample thank's.

@danmoseley
Copy link
Member

@MarcoRossignoli I assumed you would keep whatever order is fastest in your measurements (since aside from that we do not care about ordering). The bug is just to help other people.

@MarcoRossignoli
Copy link
Member Author

I apologize @danmosemsft I misunderstood the intentions, I confirm that I would keep code as is, it's ready for review, the outcome of perf tests are above!
I'll fill a issue on CoreCLR as extra.

@MarcoRossignoli
Copy link
Member Author

MarcoRossignoli commented Mar 23, 2019

Running the performance tests on x86 as well

@jkotas I don't have x86 arch machine, is there a way to run x86 on x64 without distort outcome?Or you mean compile x86 and compare before after on x64?

@jkotas
Copy link
Member

jkotas commented Mar 23, 2019

It is fine to just build and run x86 build on x64 machine.

@adamsitnik
Copy link
Member

I don't have x86 arch machine, is there a way to run x86 on x64 without distort outcome?

@MarcoRossignoli you can use the python script from dotnet/performance repo and tell it to download x86 cli and run the benchmarks using it

py .\scripts\benchmarks_ci.py --architecture x86

If you are using CoreRun to run the benchmarks, you need to build the repo for x86 Release as well

build -c Release -arch x86

@MarcoRossignoli
Copy link
Member Author

MarcoRossignoli commented Mar 25, 2019

If you are using CoreRun to run the benchmarks, you need to build the repo for x86 Release as well

Thank's @adamsitnik I'm using CoreRun at the moment(I need to compare corefx+localclr-updated vs upstreamcorefx+upstreamclr to avoid strange drift) and it works(outcome here in some days I think), one more thing I done is to install x86 preview to avoid error on build generated BDN projects.

@MarcoRossignoli
Copy link
Member Author

Running the performance tests on x86 as well

@jkotas @danmosemsft X86 test are "in line"

BenchmarkDotNet=v0.11.3.1003-nightly, OS=Windows 10.0.17134.648 (1803/April2018Update/Redstone4)
Intel Core i7 CPU 860 2.80GHz (Nehalem), 1 CPU, 8 logical and 4 physical cores
Frequency=2727540 Hz, Resolution=366.6307 ns, Timer=TSC
.NET Core SDK=3.0.100-preview3-010431
  [Host]     : .NET Core 3.0.0-preview3-27503-5 (CoreCLR 4.6.27422.72, CoreFX 4.7.19.12807), 32bit RyuJIT
  Job-HNRQQM : .NET Core ae988845-d656-4730-890f-0a72e2fa9bd5 (CoreCLR 4.6.27623.0, CoreFX 4.700.19.17601), 32bit RyuJIT
  Job-CHVHEB : .NET Core 57e7ed6b-68a2-4caf-a251-80b88523826f (CoreCLR 4.6.27623.0, CoreFX 4.700.19.17601), 32bit RyuJIT
  Job-VBGKTO : .NET Core ae988845-d656-4730-890f-0a72e2fa9bd5 (CoreCLR 4.6.27623.0, CoreFX 4.700.19.17601), 32bit RyuJIT
  Job-NSZOKV : .NET Core 57e7ed6b-68a2-4caf-a251-80b88523826f (CoreCLR 4.6.27623.0, CoreFX 4.700.19.17601), 32bit RyuJIT

Runtime=Core  IterationTime=250.0000 ms  MaxIterationCount=20  
MinIterationCount=15  WarmupCount=1  
Namespace Type Method Toolchain InvocationCount UnrollFactor Size Items Count Mean Error StdDev Median Min Max Ratio RatioSD Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
System.Collections CtorDefaultSize<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? ? 14.487 ns 0.0964 ns 0.0855 ns 14.502 ns 14.330 ns 14.621 ns 1.00 0.00 0.0105 - - 44 B
System.Collections CtorDefaultSize<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? ? 14.552 ns 0.0824 ns 0.0688 ns 14.551 ns 14.423 ns 14.675 ns 1.00 0.01 0.0105 - - 44 B
System.Collections CtorDefaultSize<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? ? 103.886 ns 1.9727 ns 1.8453 ns 102.824 ns 102.241 ns 106.604 ns 1.00 0.00 0.0103 - - 44 B
System.Collections CtorDefaultSize<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? ? 105.435 ns 0.7468 ns 0.6236 ns 105.192 ns 104.774 ns 107.025 ns 1.02 0.02 0.0104 - - 44 B
System.Collections DictionaryMappingFunction Entropy \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 500 ? 67,503.623 ns 407.7345 ns 340.4768 ns 67,467.063 ns 67,194.164 ns 68,458.408 ns 1.00 0.00 17.2414 2.6940 - 72880 B
System.Collections DictionaryMappingFunction Entropy \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 500 ? 71,505.954 ns 563.2590 ns 499.3145 ns 71,338.686 ns 71,028.143 ns 72,878.275 ns 1.06 0.01 17.3295 2.5568 - 72880 B
System.Collections TryAddDefaultSize<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 132,043.417 ns 1,555.2806 ns 1,378.7158 ns 131,528.061 ns 131,027.954 ns 135,660.295 ns 1.00 0.00 24.4792 11.9792 - 153884 B
System.Collections TryAddDefaultSize<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 134,564.244 ns 1,713.6845 ns 1,519.1367 ns 133,957.529 ns 133,286.646 ns 138,659.000 ns 1.02 0.02 24.0385 11.7521 - 153884 B
System.Collections TryAddDefaultSize<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 219,242.067 ns 2,653.6312 ns 2,352.3749 ns 218,201.828 ns 217,538.191 ns 225,631.565 ns 1.00 0.00 24.0385 11.5385 - 153884 B
System.Collections TryAddDefaultSize<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 220,143.542 ns 3,680.8412 ns 3,073.6691 ns 219,010.159 ns 217,894.354 ns 228,540.649 ns 1.00 0.02 24.3056 12.1528 - 153884 B
System.Collections TryAddGiventSize<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 56,868.841 ns 393.8162 ns 349.1078 ns 56,779.899 ns 56,512.896 ns 57,527.607 ns 1.00 0.00 7.4728 3.6232 - 46728 B
System.Collections TryAddGiventSize<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 57,845.773 ns 970.7984 ns 908.0854 ns 57,356.754 ns 57,160.643 ns 59,565.737 ns 1.02 0.02 7.2993 3.6496 - 46728 B
System.Collections TryAddGiventSize<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 120,206.665 ns 652.8605 ns 509.7105 ns 120,233.078 ns 119,527.804 ns 121,027.211 ns 1.00 0.00 7.1565 3.3397 - 46728 B
System.Collections TryAddGiventSize<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 119,083.220 ns 623.0038 ns 552.2766 ns 118,879.264 ns 118,567.121 ns 120,243.897 ns 0.99 0.01 7.1565 3.3397 - 46728 B
System.Collections AddDefaultSize<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 133,248.953 ns 1,643.4215 ns 1,537.2575 ns 133,129.157 ns 131,521.681 ns 136,736.541 ns 1.00 0.00 24.1597 12.0798 - 153884 B
System.Collections AddDefaultSize<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 134,053.101 ns 774.8096 ns 686.8485 ns 133,926.878 ns 133,287.798 ns 135,805.860 ns 1.01 0.01 24.3644 12.1822 - 153884 B
System.Collections AddDefaultSize<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 221,236.218 ns 1,101.0529 ns 1,029.9256 ns 220,977.963 ns 219,810.295 ns 223,559.224 ns 1.00 0.00 24.6479 12.3239 - 153884 B
System.Collections AddDefaultSize<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 2048 216,768.679 ns 516.7075 ns 458.0477 ns 216,662.537 ns 216,145.236 ns 217,520.101 ns 0.98 0.00 23.9726 11.9863 - 153884 B
System.Collections Remove<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 2048 ? ? 55,484.560 ns 199.0323 ns 176.4370 ns 55,467.014 ns 55,292.315 ns 55,827.596 ns 1.00 0.00 - - - -
System.Collections Remove<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 2048 ? ? 56,498.872 ns 319.7885 ns 267.0379 ns 56,506.938 ns 56,101.444 ns 56,913.531 ns 1.02 0.01 - - - -
System.Collections Remove<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 2048 ? ? 118,963.503 ns 155.3803 ns 137.7406 ns 118,959.025 ns 118,694.868 ns 119,166.722 ns 1.00 0.00 - - - -
System.Collections Remove<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 2048 ? ? 119,691.119 ns 490.9720 ns 435.2339 ns 119,619.914 ns 119,099.848 ns 120,410.186 ns 1.01 0.00 - - - -
System.Collections Clear<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 2048 ? ? 7,884.776 ns 136.5332 ns 121.0331 ns 7,873.725 ns 7,729.089 ns 8,185.178 ns 1.00 0.00 - - - -
System.Collections Clear<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 2048 ? ? 7,818.619 ns 58.7611 ns 49.0682 ns 7,816.278 ns 7,760.550 ns 7,932.133 ns 0.99 0.02 - - - -
System.Collections Clear<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 2048 ? ? 7,501.807 ns 82.9905 ns 73.5689 ns 7,463.520 ns 7,407.609 ns 7,652.518 ns 1.00 0.00 - - - -
System.Collections Clear<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 2048 ? ? 7,519.754 ns 82.8145 ns 73.4129 ns 7,506.764 ns 7,425.006 ns 7,716.844 ns 1.00 0.01 - - - -
System.Collections ContainsKeyFalse<Int32, Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 41,637.821 ns 187.0940 ns 146.0707 ns 41,651.310 ns 41,415.549 ns 41,890.196 ns 1.00 0.00 - - - -
System.Collections ContainsKeyFalse<Int32, Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 42,606.069 ns 45.6726 ns 40.4876 ns 42,618.151 ns 42,533.954 ns 42,663.948 ns 1.02 0.00 - - - -
System.Collections ContainsKeyFalse<String, String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 101,538.132 ns 619.8837 ns 517.6309 ns 101,347.902 ns 100,949.487 ns 102,529.547 ns 1.00 0.00 - - - -
System.Collections ContainsKeyFalse<String, String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 101,237.031 ns 95.3221 ns 84.5006 ns 101,237.170 ns 101,077.668 ns 101,422.582 ns 1.00 0.01 - - - -
System.Collections ContainsKeyTrue<Int32, Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 37,653.431 ns 42.6902 ns 37.8438 ns 37,658.219 ns 37,583.817 ns 37,727.487 ns 1.00 0.00 - - - -
System.Collections ContainsKeyTrue<Int32, Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 38,710.742 ns 45.7540 ns 40.5597 ns 38,706.522 ns 38,646.238 ns 38,782.762 ns 1.03 0.00 - - - -
System.Collections ContainsKeyTrue<String, String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 101,946.493 ns 208.6697 ns 184.9802 ns 101,891.503 ns 101,661.168 ns 102,289.083 ns 1.00 0.00 - - - -
System.Collections ContainsKeyTrue<String, String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 101,891.423 ns 103.2487 ns 96.5789 ns 101,911.744 ns 101,715.615 ns 102,004.457 ns 1.00 0.00 - - - -
System.Collections TryGetValueFalse<Int32, Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 44,802.958 ns 85.4029 ns 79.8859 ns 44,818.523 ns 44,657.794 ns 44,915.893 ns 1.00 0.00 - - - -
System.Collections TryGetValueFalse<Int32, Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 45,372.288 ns 44.3472 ns 39.3126 ns 45,379.437 ns 45,306.842 ns 45,440.808 ns 1.01 0.00 - - - -
System.Collections TryGetValueFalse<String, String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 104,427.196 ns 288.0475 ns 269.4398 ns 104,439.692 ns 104,020.969 ns 104,775.923 ns 1.00 0.00 - - - -
System.Collections TryGetValueFalse<String, String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 103,773.595 ns 149.1221 ns 132.1929 ns 103,748.643 ns 103,641.279 ns 104,076.956 ns 0.99 0.00 - - - -
System.Collections TryGetValueTrue<Int32, Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 43,585.715 ns 82.2605 ns 76.9466 ns 43,599.566 ns 43,468.417 ns 43,705.849 ns 1.00 0.00 - - - -
System.Collections TryGetValueTrue<Int32, Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 42,304.611 ns 47.9194 ns 44.8238 ns 42,297.213 ns 42,236.211 ns 42,385.774 ns 0.97 0.00 - - - -
System.Collections TryGetValueTrue<String, String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 108,418.198 ns 464.0340 ns 434.0576 ns 108,272.820 ns 107,978.568 ns 109,469.743 ns 1.00 0.00 - - - -
System.Collections TryGetValueTrue<String, String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 107,908.481 ns 94.7269 ns 88.6076 ns 107,904.404 ns 107,749.376 ns 108,091.986 ns 1.00 0.00 - - - -
System.Collections AddGivenSize<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 57,145.425 ns 650.1131 ns 608.1162 ns 56,915.344 ns 56,409.903 ns 58,539.951 ns 1.00 0.00 7.4458 3.6101 - 46728 B
System.Collections AddGivenSize<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 57,325.593 ns 201.0405 ns 188.0534 ns 57,265.153 ns 57,097.311 ns 57,799.625 ns 1.00 0.01 7.3801 3.6900 - 46728 B
System.Collections AddGivenSize<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 119,786.385 ns 493.1186 ns 411.7763 ns 119,603.123 ns 119,378.002 ns 120,733.626 ns 1.00 0.00 7.1565 3.3397 - 46728 B
System.Collections AddGivenSize<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 119,060.083 ns 312.8286 ns 244.2360 ns 118,979.651 ns 118,712.055 ns 119,491.840 ns 0.99 0.01 7.1023 3.3144 - 46728 B
System.Collections CtorFromCollection<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 55,668.340 ns 196.4070 ns 183.7192 ns 55,711.146 ns 55,345.645 ns 55,921.733 ns 1.00 0.00 7.2623 3.5211 - 46728 B
System.Collections CtorFromCollection<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 57,146.244 ns 126.6866 ns 98.9085 ns 57,180.839 ns 56,917.406 ns 57,266.141 ns 1.03 0.00 7.2993 3.6496 - 46728 B
System.Collections CtorFromCollection<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 121,360.661 ns 183.2722 ns 162.4660 ns 121,344.649 ns 121,087.439 ns 121,650.352 ns 1.00 0.00 7.2674 3.3915 - 46728 B
System.Collections CtorFromCollection<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 119,822.587 ns 281.3445 ns 249.4046 ns 119,821.762 ns 119,449.971 ns 120,474.473 ns 0.99 0.00 7.1565 3.3397 - 46728 B
System.Collections CtorGivenSize<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 7,856.582 ns 79.8030 ns 70.7432 ns 7,845.075 ns 7,769.992 ns 8,050.270 ns 1.00 0.00 7.4038 3.7019 - 46728 B
System.Collections CtorGivenSize<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 7,799.138 ns 36.9055 ns 30.8178 ns 7,796.465 ns 7,734.241 ns 7,850.139 ns 0.99 0.01 7.3911 3.6801 - 46728 B
System.Collections CtorGivenSize<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 11,990.329 ns 219.8147 ns 183.5552 ns 11,978.414 ns 11,673.644 ns 12,286.341 ns 1.00 0.00 7.3984 3.6992 - 46728 B
System.Collections CtorGivenSize<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 11,938.693 ns 55.3671 ns 51.7904 ns 11,933.295 ns 11,848.627 ns 12,017.636 ns 1.00 0.02 7.4192 3.7096 - 46728 B
System.Collections IndexerSet<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 48,799.089 ns 55.0910 ns 51.5322 ns 48,806.004 ns 48,701.140 ns 48,873.248 ns 1.00 0.00 - - - -
System.Collections IndexerSet<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 47,823.848 ns 114.0891 ns 101.1371 ns 47,809.900 ns 47,661.061 ns 48,039.359 ns 0.98 0.00 - - - -
System.Collections IndexerSet<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 109,546.556 ns 221.4397 ns 196.3005 ns 109,527.007 ns 109,271.904 ns 109,971.514 ns 1.00 0.00 - - - -
System.Collections IndexerSet<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 111,299.135 ns 385.8300 ns 360.9056 ns 111,136.004 ns 110,907.347 ns 112,030.966 ns 1.02 0.00 - - - -
System.Collections IterateForEach<Int32> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 11,374.825 ns 16.3140 ns 12.7369 ns 11,372.462 ns 11,357.236 ns 11,397.361 ns 1.00 0.00 - - - -
System.Collections IterateForEach<Int32> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 11,533.283 ns 180.3146 ns 168.6664 ns 11,455.664 ns 11,373.511 ns 11,879.522 ns 1.01 0.01 - - - -
System.Collections IterateForEach<String> Dictionary \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 16,204.071 ns 23.9129 ns 19.9684 ns 16,209.247 ns 16,171.041 ns 16,232.162 ns 1.00 0.00 - - - -
System.Collections IterateForEach<String> Dictionary \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 2048 ? ? 16,631.174 ns 73.8790 ns 65.4918 ns 16,616.995 ns 16,503.511 ns 16,750.972 ns 1.03 0.00 - - - -
System.Collections.Tests Perf_Dictionary ContainsValue \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 3000 ? 6,613,647.749 ns 38,014.3712 ns 31,743.7216 ns 6,599,895.571 ns 6,585,161.599 ns 6,678,209.424 ns 1.00 0.00 - - - -
System.Collections.Tests Perf_Dictionary ContainsValue \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 3000 ? 6,666,225.055 ns 119,782.9049 ns 112,045.0036 ns 6,603,035.229 ns 6,572,864.575 ns 6,840,214.763 ns 1.01 0.02 - - - -
System.Collections DictionaryMappingFunction Entropy \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 5000 ? 870,594.301 ns 8,704.8301 ns 7,716.6051 ns 869,052.397 ns 862,295.188 ns 884,716.949 ns 1.00 0.00 121.5278 121.5278 121.5278 672700 B
System.Collections DictionaryMappingFunction Entropy \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 5000 ? 926,995.654 ns 4,092.5579 ns 3,627.9460 ns 927,558.983 ns 920,986.588 ns 933,315.894 ns 1.06 0.01 121.3235 121.3235 121.3235 672700 B
System.Collections DictionaryMappingFunction Entropy \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 50000 ? 9,596,311.407 ns 36,701.6817 ns 32,535.0849 ns 9,603,298.668 ns 9,533,444.056 ns 9,659,576.487 ns 1.00 0.00 1031.2500 1000.0000 1000.0000 6037375 B
System.Collections DictionaryMappingFunction Entropy \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 50000 ? 10,205,315.504 ns 36,768.0382 ns 30,702.9771 ns 10,205,349.876 ns 10,144,729.775 ns 10,255,704.316 ns 1.06 0.00 1031.2500 1000.0000 1000.0000 6037375 B
System.Collections DictionaryMappingFunction Entropy \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 500000 ? 100,175,066.178 ns 452,478.1775 ns 401,110.1187 ns 100,080,292.132 ns 99,478,651.090 ns 101,112,357.655 ns 1.00 0.00 2500.0000 2500.0000 2500.0000 53888068 B
System.Collections DictionaryMappingFunction Entropy \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 500000 ? 101,771,939.446 ns 635,148.1080 ns 563,042.2541 ns 101,564,963.300 ns 101,142,604.690 ns 103,011,138.240 ns 1.02 0.00 2500.0000 2500.0000 2500.0000 53888068 B
System.Collections DictionaryMappingFunction Entropy \corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 5000000 ? 966,377,028.385 ns 10,653,155.6259 ns 9,964,968.3821 ns 969,185,419.830 ns 953,315,441.750 ns 988,659,378.050 ns 1.00 0.00 5000.0000 5000.0000 5000.0000 471721532 B
System.Collections DictionaryMappingFunction Entropy \corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x86\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 5000000 ? 980,115,659.775 ns 8,253,205.0269 ns 7,720,053.1028 ns 980,963,798.880 ns 967,291,405.440 ns 990,165,497.110 ns 1.01 0.01 5000.0000 5000.0000 5000.0000 471721532 B

Now on Examining assembly diffs on both x64 and x86

@MarcoRossignoli
Copy link
Member Author

@adamsitnik maybe you can save me some "search" time, is there a way to re-order/specify columns using console command?I'd like to have Ratio column "first"(near to Toolchain) to avoid scrolling

@adamsitnik
Copy link
Member

maybe you can save me some "search" time, is there a way to re-order/specify columns using console command

there is no way to do it from console command, you would have to modify the code

@MarcoRossignoli
Copy link
Member Author

thank's for quick response!

@MarcoRossignoli
Copy link
Member Author

MarcoRossignoli commented Mar 26, 2019

I don't see issue on x64 principal difference are as expected

  • idiv to div
  • cmp to from 0 to -1
  • offset - free part(neg edx...)
  • load of next in TrimExcess()
  • new range check and load of next in TryInsert
    @jkotas double check pls
diff --git a/x64coreclrupstream.txt b/x64coreclr.txt
index bbac7b2..4aa19ef 100644
--- a/x64coreclrupstream.txt
+++ b/x64coreclr.txt
@@ -6,56 +6,57 @@
 ; fully interruptible
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T03] ( 34, 21.50)     ref  ->  rsi         this class-hnd
+;  V00 this         [V00,T03] ( 35, 22   )     ref  ->  rsi         this class-hnd
 ;  V01 arg1         [V01,T08] ( 11,  9   )     ref  ->  rdi         ld-addr-op class-hnd
 ;  V02 arg2         [V02,T16] (  5,  3.50)     ref  ->  rbp         class-hnd
 ;  V03 arg3         [V03,T13] (  6,  4   )   ubyte  ->  rbx        
-;  V04 loc0         [V04,T04] ( 11, 23   )     ref  ->  registers   class-hnd
+;  V04 loc0         [V04,T04] ( 13, 24   )     ref  ->  registers   class-hnd
 ;  V05 loc1         [V05,T12] (  6,  6   )     ref  ->  r15         class-hnd
 ;  V06 loc2         [V06,T09] (  6, 11   )     int  ->  r12        
 ;  V07 loc3         [V07,T01] (  8, 25.50)     int  ->  r13        
 ;  V08 loc4         [V08,T20] (  5,  3.50)   byref  ->  [rsp+0x50]  
 ;  V09 loc5         [V09,T00] (  9, 29   )     int  ->  [rsp+0x5C]  
-;  V10 loc6         [V10,T30] (  3,  1.50)    bool  ->  registers  
-;  V11 loc7         [V11,T23] (  6,  3   )     int  ->  registers  
-;  V12 loc8         [V12,T22] (  6,  3   )   byref  ->  [rsp+0x48]  
+;  V10 loc6         [V10,T31] (  3,  1.50)    bool  ->  registers  
+;  V11 loc7         [V11,T22] (  6,  3   )     int  ->  registers  
+;  V12 loc8         [V12,T26] (  5,  2.50)   byref  ->  [rsp+0x48]  
 ;* V13 loc9         [V13    ] (  0,  0   )     ref  ->  zero-ref    ld-addr-op class-hnd
 ;  V14 loc10        [V14,T18] (  3,  4.50)     ref  ->  [rsp+0x40]   class-hnd
-;  V15 loc11        [V15,T31] (  3,  1.50)     int  ->  r14        
+;  V15 loc11        [V15,T32] (  3,  1.50)     int  ->  r14        
 ;  V16 OutArgs      [V16    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
-;  V17 tmp1         [V17,T28] (  3,  2   )     int  ->  rax        
+;  V17 tmp1         [V17,T29] (  3,  2   )     int  ->  r12        
 ;  V18 tmp2         [V18,T19] (  5,  3.74)     ref  ->  r15         class-hnd "spilling QMark2"
 ;  V19 tmp3         [V19,T10] (  3, 10   )    long  ->  rcx         "impRuntimeLookup slot"
 ;  V20 tmp4         [V20,T11] (  2,  8   )     ref  ->  [rsp+0x38]   class-hnd "impAppendStmt"
 ;* V21 tmp5         [V21    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "bubbling QMark1"
 ;  V22 tmp6         [V22,T05] (  5, 18   )    long  ->  r11         "impRuntimeLookup typehandle"
 ;* V23 tmp7         [V23    ] (  0,  0   )    long  ->  zero-ref    "VirtualCall with runtime lookup"
-;  V24 tmp8         [V24,T34] (  3,  0   )    long  ->  rcx         "impRuntimeLookup slot"
+;  V24 tmp8         [V24,T35] (  3,  0   )    long  ->  rcx         "impRuntimeLookup slot"
 ;* V25 tmp9         [V25    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "bubbling QMark1"
-;  V26 tmp10        [V26,T32] (  4,  0   )    long  ->  rax         "impRuntimeLookup typehandle"
-;  V27 tmp11        [V27,T26] (  3,  2.50)    long  ->  rcx         "impRuntimeLookup slot"
+;  V26 tmp10        [V26,T33] (  4,  0   )    long  ->  rax         "impRuntimeLookup typehandle"
+;  V27 tmp11        [V27,T27] (  3,  2.50)    long  ->  rcx         "impRuntimeLookup slot"
 ;  V28 tmp12        [V28,T21] (  4,  3.50)    long  ->   r9         "impRuntimeLookup typehandle"
-;  V29 tmp13        [V29,T35] (  3,  0   )    long  ->  rcx         "impRuntimeLookup slot"
+;  V29 tmp13        [V29,T36] (  3,  0   )    long  ->  rcx         "impRuntimeLookup slot"
 ;* V30 tmp14        [V30    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "bubbling QMark1"
-;  V31 tmp15        [V31,T33] (  4,  0   )    long  ->  rax         "impRuntimeLookup typehandle"
+;  V31 tmp15        [V31,T34] (  4,  0   )    long  ->  rax         "impRuntimeLookup typehandle"
 ;* V32 tmp16        [V32    ] (  0,  0   )    long  ->  zero-ref    "impRuntimeLookup slot"
 ;* V33 tmp17        [V33    ] (  0,  0   )    long  ->  zero-ref    "impRuntimeLookup typehandle"
 ;* V34 tmp18        [V34    ] (  0,  0   )    long  ->  zero-ref    "impRuntimeLookup slot"
 ;* V35 tmp19        [V35    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "bubbling QMark1"
 ;* V36 tmp20        [V36    ] (  0,  0   )    long  ->  zero-ref    "impRuntimeLookup typehandle"
-;  V37 tmp21        [V37,T27] (  3,  2.50)    long  ->  rcx         "impRuntimeLookup slot"
+;  V37 tmp21        [V37,T28] (  3,  2.50)    long  ->  rcx         "impRuntimeLookup slot"
 ;* V38 tmp22        [V38    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "bubbling QMark1"
 ;  V39 tmp23        [V39,T17] (  5,  4.50)    long  ->  r11         "impRuntimeLookup typehandle"
 ;* V40 tmp24        [V40    ] (  0,  0   )    long  ->  zero-ref    "VirtualCall with runtime lookup"
 ;  V41 tmp25        [V41,T14] (  3,  6   )     ref  ->  rcx         "arr expr"
 ;  V42 tmp26        [V42,T15] (  3,  6   )     int  ->  rdx         "arr expr"
 ;* V43 tmp27        [V43    ] (  0,  0   )     ref  ->  zero-ref    "argument with side effect"
-;  V44 tmp28        [V44,T29] (  2,  2   )     int  ->  rdx         "argument with side effect"
-;  V45 tmp29        [V45,T24] (  3,  3   )     ref  ->  rcx         "arr expr"
-;  V46 tmp30        [V46,T25] (  3,  3   )     int  ->  rdx         "arr expr"
-;  V47 cse0         [V47,T06] (  4, 12.50)   byref  ->  [rsp+0x30]   "ValNumCSE"
-;  V48 cse1         [V48,T07] (  4, 12.50)   byref  ->  [rsp+0x28]   "ValNumCSE"
-;  V49 cse2         [V49,T02] (  7, 24.50)     int  ->  [rsp+0x58]   "ValNumCSE"
+;  V44 tmp28        [V44,T30] (  2,  2   )     int  ->  rdx         "argument with side effect"
+;  V45 tmp29        [V45,T23] (  3,  3   )     ref  ->  rcx         "arr expr"
+;  V46 tmp30        [V46,T24] (  3,  3   )     int  ->  rdx         "arr expr"
+;  V47 tmp31        [V47,T25] (  3,  3   )     int  ->  rdx         "arr expr"
+;  V48 cse0         [V48,T06] (  4, 12.50)   byref  ->  [rsp+0x30]   "ValNumCSE"
+;  V49 cse1         [V49,T07] (  4, 12.50)   byref  ->  [rsp+0x28]   "ValNumCSE"
+;  V50 cse2         [V50,T02] (  7, 24.50)     int  ->  [rsp+0x58]   "ValNumCSE"
 ;
 ; Lcl frame size = 104
 
@@ -106,6 +107,7 @@ G_M9942_IG05:
        mov      rdx, rdi
        cmp      dword ptr [rcx], ecx
        call     qword ptr [r11]
+       mov      r12d, eax
        jmp      SHORT G_M9942_IG07
 
 G_M9942_IG06:
@@ -113,16 +115,15 @@ G_M9942_IG06:
        mov      rax, qword ptr [rdi]
        mov      rax, qword ptr [rax+64]
        call     qword ptr [rax+24]Object:GetHashCode():int:this
+       mov      r12d, eax
 
 G_M9942_IG07:
-       mov      r12d, eax
-       and      r12d, 0xD1FFAB1E
        xor      r13d, r13d
        mov      rcx, gword ptr [rsi+8]
        mov      r8, gword ptr [rsi+8]
        mov      eax, r12d
-       cdq      
-       idiv     edx:eax, dword ptr [r8+8]
+       xor      rdx, rdx
+       div      edx:eax, dword ptr [r8+8]
        cmp      edx, dword ptr [rcx+8]
        jae      G_M9942_IG38
        movsxd   rdx, edx
@@ -175,7 +176,7 @@ G_M9942_IG12:
        lea      rdx, [rdx+2*rdx]
        lea      r11, bword ptr [r14+8*rdx+16]
        mov      bword ptr [rsp+30H], r11
-       cmp      dword ptr [r11+16], r12d
+       cmp      dword ptr [r11+20], r12d
        jne      SHORT G_M9942_IG13
        movsxd   rdx, r10d
        lea      rdx, [rdx+2*rdx]
@@ -192,7 +193,7 @@ G_M9942_IG12:
 
 G_M9942_IG13:
        mov      r11, bword ptr [rsp+30H]
-       mov      r10d, dword ptr [r11+20]
+       mov      r10d, dword ptr [r11+16]
        mov      r8d, r10d
        mov      r9d, dword ptr [rsp+58H]
        cmp      r9d, r13d
@@ -228,7 +229,7 @@ G_M9942_IG17:
        lea      rcx, [rcx+2*rcx]
        lea      r10, bword ptr [r14+8*rcx+16]
        mov      bword ptr [rsp+28H], r10
-       cmp      dword ptr [r10+16], r12d
+       cmp      dword ptr [r10+20], r12d
        jne      G_M9942_IG21
        movsxd   rcx, r8d
        lea      rcx, [rcx+2*rcx]
@@ -273,7 +274,7 @@ G_M9942_IG20:
 
 G_M9942_IG21:
        mov      r10, bword ptr [rsp+28H]
-       mov      r8d, dword ptr [r10+20]
+       mov      r8d, dword ptr [r10+16]
        mov      ecx, r8d
        mov      r9d, dword ptr [rsp+58H]
        cmp      r9d, r13d
@@ -307,8 +308,8 @@ G_M9942_IG24:
        mov      rcx, gword ptr [rsi+8]
        mov      r8, gword ptr [rsi+8]
        mov      eax, r12d
-       cdq      
-       idiv     edx:eax, dword ptr [r8+8]
+       xor      rdx, rdx
+       div      edx:eax, dword ptr [r8+8]
        cmp      edx, dword ptr [rcx+8]
        jae      G_M9942_IG38
        movsxd   rdx, edx
@@ -332,15 +333,22 @@ G_M9942_IG26:
        lea      r8, bword ptr [r14+8*rcx+16]
        test     edx, edx
        je       SHORT G_M9942_IG27
-       mov      edx, dword ptr [r8+20]
+       mov      edx, dword ptr [rsi+52]
+       cmp      edx, dword ptr [r14+8]
+       jae      G_M9942_IG38
+       movsxd   rdx, edx
+       lea      rdx, [rdx+2*rdx]
+       mov      edx, dword ptr [r14+8*rdx+32]
+       neg      edx
+       add      edx, -3
        mov      dword ptr [rsi+52], edx
 
 G_M9942_IG27:
-       mov      dword ptr [r8+16], r12d
+       mov      dword ptr [r8+20], r12d
        mov      rax, bword ptr [rsp+50H]
        mov      edx, dword ptr [rax]
        dec      edx
-       mov      dword ptr [r8+20], edx
+       mov      dword ptr [r8+16], edx
        mov      bword ptr [rsp+48H], r8
        mov      rcx, r8
        mov      rdx, rdi
@@ -436,7 +444,7 @@ G_M9942_IG38:
        call     CORINFO_HELP_RNGCHKFAIL
        int3     
 
-; Total bytes of code 1069, prolog size 33 for method Dictionary`2:TryInsert(ref,ref,ubyte):bool:this
+; Total bytes of code 1093, prolog size 33 for method Dictionary`2:TryInsert(ref,ref,ubyte):bool:this
 ; ============================================================
 ; Assembly listing for method Dictionary`2:Resize(int,bool):this
 ; Emitting BLENDED_CODE for X64 CPU with SSE2 - Windows
@@ -446,28 +454,27 @@ G_M9942_IG38:
 ; fully interruptible
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T05] (  8,  8   )     ref  ->  rsi         this class-hnd
-;  V01 arg1         [V01,T12] (  5,  6   )     int  ->  rdi        
-;  V02 arg2         [V02,T16] (  3,  3   )    bool  ->  rbx        
-;  V03 loc0         [V03,T11] (  5,  8   )     ref  ->  rbp         class-hnd
+;  V00 this         [V00,T06] (  8,  8   )     ref  ->  rsi         this class-hnd
+;  V01 arg1         [V01,T11] (  5,  6   )     int  ->  rdi        
+;  V02 arg2         [V02,T15] (  3,  3   )    bool  ->  rbx        
+;  V03 loc0         [V03,T10] (  5,  8   )     ref  ->  rbp         class-hnd
 ;  V04 loc1         [V04,T02] (  8, 14.50)     ref  ->  r14         class-hnd
-;  V05 loc2         [V05,T04] (  6, 11.50)     int  ->  r15        
+;  V05 loc2         [V05,T05] (  6, 11.50)     int  ->  r15        
 ;* V06 loc3         [V06    ] (  0,  0   )     ref  ->  zero-ref    ld-addr-op class-hnd
 ;  V07 loc4         [V07,T01] (  6, 20.50)     int  ->  rbx        
 ;  V08 loc5         [V08,T00] (  7, 23   )     int  ->  r12        
-;  V09 loc6         [V09,T13] (  4,  8   )     int  ->  rdx        
+;  V09 loc6         [V09,T12] (  4,  8   )     int  ->  rdx        
 ;  V10 OutArgs      [V10    ] (  1,  1   )  lclBlk (48) [rsp+0x00]   "OutgoingArgSpace"
-;  V11 tmp1         [V11,T17] (  3,  4.50)    long  ->  rcx         "impRuntimeLookup slot"
-;  V12 tmp2         [V12,T15] (  4,  6.50)    long  ->  rax         "impRuntimeLookup typehandle"
-;  V13 tmp3         [V13,T14] (  2,  8   )   byref  ->  r12         "non-inline candidate call"
-;  V14 tmp4         [V14,T18] (  2,  4   )     ref  ->  rcx         class-hnd "Inlining Arg"
-;  V15 cse0         [V15,T08] (  3, 10   )     int  ->  rax         "ValNumCSE"
-;  V16 cse1         [V16,T06] (  3, 10   )   byref  ->  rcx         "ValNumCSE"
-;  V17 cse2         [V17,T07] (  3, 10   )   byref  ->  r12         "ValNumCSE"
-;  V18 cse3         [V18,T09] (  3, 10   )    long  ->  rcx         "ValNumCSE"
-;  V19 cse4         [V19,T10] (  4,  9.50)     int  ->  r13         "ValNumCSE"
-;  V20 cse5         [V20,T19] (  2,  4   )     int  ->  rax         "ValNumCSE"
-;  V21 rat0         [V21,T03] (  3, 12   )     ref  ->  rcx         "virtual vtable call"
+;  V11 tmp1         [V11,T16] (  3,  4.50)    long  ->  rcx         "impRuntimeLookup slot"
+;  V12 tmp2         [V12,T14] (  4,  6.50)    long  ->  rax         "impRuntimeLookup typehandle"
+;  V13 tmp3         [V13,T13] (  2,  8   )   byref  ->  r12         "non-inline candidate call"
+;  V14 tmp4         [V14,T17] (  2,  4   )     ref  ->  rcx         class-hnd "Inlining Arg"
+;  V15 cse0         [V15,T03] (  4, 12   )   byref  ->  rcx         "ValNumCSE"
+;  V16 cse1         [V16,T07] (  3, 10   )   byref  ->  r12         "ValNumCSE"
+;  V17 cse2         [V17,T08] (  3, 10   )    long  ->  rcx         "ValNumCSE"
+;  V18 cse3         [V18,T09] (  4,  9.50)     int  ->  r13         "ValNumCSE"
+;  V19 cse4         [V19,T18] (  2,  4   )     int  ->  rax         "ValNumCSE"
+;  V20 rat0         [V20,T04] (  3, 12   )     ref  ->  rcx         "virtual vtable call"
 ;
 ; Lcl frame size = 56
 
@@ -528,14 +535,13 @@ G_M29783_IG04:
        movsxd   rcx, ebx
        lea      rcx, [rcx+2*rcx]
        lea      r12, bword ptr [r14+8*rcx+16]
-       cmp      dword ptr [r12+16], 0
+       cmp      dword ptr [r12+16], -1
        jl       SHORT G_M29783_IG05
        mov      rcx, gword ptr [r14+8*rcx+16]
        mov      rax, qword ptr [rcx]
        mov      rax, qword ptr [rax+64]
        call     qword ptr [rax+24]Object:GetHashCode():int:this
-       and      eax, 0xD1FFAB1E
-       mov      dword ptr [r12+16], eax
+       mov      dword ptr [r12+20], eax
 
 G_M29783_IG05:
        inc      ebx
@@ -554,18 +560,18 @@ G_M29783_IG07:
        movsxd   rax, r12d
        lea      rax, [rax+2*rax]
        lea      rcx, bword ptr [r14+8*rax+16]
-       mov      eax, dword ptr [rcx+16]
-       test     eax, eax
+       cmp      dword ptr [rcx+16], -1
        jl       SHORT G_M29783_IG08
-       cdq      
-       idiv     edx:eax, edi
+       mov      eax, dword ptr [rcx+20]
+       xor      rdx, rdx
+       div      edx:eax, edi
        mov      eax, dword ptr [rbp+8]
        cmp      edx, eax
        jae      SHORT G_M29783_IG11
        movsxd   rax, edx
        mov      eax, dword ptr [rbp+4*rax+16]
        dec      eax
-       mov      dword ptr [rcx+20], eax
+       mov      dword ptr [rcx+16], eax
        lea      eax, [r12+1]
        movsxd   rdx, edx
        mov      dword ptr [rbp+4*rdx+16], eax
@@ -600,7 +606,7 @@ G_M29783_IG11:
        call     CORINFO_HELP_RNGCHKFAIL
        int3     
 
-; Total bytes of code 338, prolog size 29 for method Dictionary`2:Resize(int,bool):this
+; Total bytes of code 336, prolog size 29 for method Dictionary`2:Resize(int,bool):this
 ; ============================================================
 ; Assembly listing for method Dictionary`2:FindEntry(ref):int:this
 ; Emitting BLENDED_CODE for X64 CPU with SSE2 - Windows
@@ -682,11 +688,10 @@ G_M16827_IG03:
        mov      rax, qword ptr [rax+64]
        call     qword ptr [rax+24]Object:GetHashCode():int:this
        mov      r12d, eax
-       and      r12d, 0xD1FFAB1E
        mov      r13d, dword ptr [rbp+8]
        mov      eax, r12d
-       cdq      
-       idiv     edx:eax, r13d
+       xor      rdx, rdx
+       div      edx:eax, r13d
        cmp      edx, r13d
        jae      G_M16827_IG22
        movsxd   rcx, edx
@@ -715,7 +720,7 @@ G_M16827_IG05:
        lea      rdx, [rdx+2*rdx]
        lea      rax, bword ptr [r14+8*rdx+16]
        mov      bword ptr [rsp+30H], rax
-       cmp      dword ptr [rax+16], r12d
+       cmp      dword ptr [rax+20], r12d
        jne      SHORT G_M16827_IG06
        mov      rdx, gword ptr [r14+8*rdx+16]
        mov      rcx, r13
@@ -728,7 +733,7 @@ G_M16827_IG05:
 
 G_M16827_IG06:
        mov      rax, bword ptr [rsp+30H]
-       mov      ebx, dword ptr [rax+20]
+       mov      ebx, dword ptr [rax+16]
        cmp      ebp, r15d
        jle      G_M16827_IG20
 
@@ -755,13 +760,12 @@ G_M16827_IG09:
        cmp      dword ptr [rcx], ecx
        call     qword ptr [r11]
        mov      r8d, eax
-       and      r8d, 0xD1FFAB1E
        mov      eax, dword ptr [rbp+8]
        mov      dword ptr [rsp+44H], eax
        mov      ecx, dword ptr [rsp+44H]
        mov      eax, r8d
-       cdq      
-       idiv     edx:eax, ecx
+       xor      rdx, rdx
+       div      edx:eax, ecx
        cmp      edx, ecx
        jae      G_M16827_IG22
        movsxd   rcx, edx
@@ -780,7 +784,7 @@ G_M16827_IG10:
        lea      r9, bword ptr [r14+8*rcx+16]
        mov      bword ptr [rsp+28H], r9
        mov      dword ptr [rsp+4CH], r8d
-       cmp      dword ptr [r9+16], r8d
+       cmp      dword ptr [r9+20], r8d
        jne      SHORT G_M16827_IG12
        mov      r10, gword ptr [r14+8*rcx+16]
        mov      gword ptr [rsp+38H], r10
@@ -804,7 +808,7 @@ G_M16827_IG11:
 
 G_M16827_IG12:
        mov      r9, bword ptr [rsp+28H]
-       mov      ebp, dword ptr [r9+20]
+       mov      ebp, dword ptr [r9+16]
        mov      eax, dword ptr [rsp+48H]
        cmp      eax, r15d
        jle      SHORT G_M16827_IG21
@@ -858,7 +862,7 @@ G_M16827_IG22:
        call     CORINFO_HELP_RNGCHKFAIL
        int3     
 
-; Total bytes of code 565, prolog size 27 for method Dictionary`2:FindEntry(ref):int:this
+; Total bytes of code 553, prolog size 27 for method Dictionary`2:FindEntry(ref):int:this
 ; ============================================================
 ; Assembly listing for method Dictionary`2:Remove(int):bool:this
 ; Emitting BLENDED_CODE for X64 CPU with SSE2 - Windows
@@ -877,7 +881,7 @@ G_M16827_IG22:
 ;  V06 loc4         [V06,T19] (  3,  1.50)     int  ->  rdx        
 ;  V07 loc5         [V07,T09] (  5,  6   )     int  ->  r12        
 ;  V08 loc6         [V08,T00] (  8, 21.50)     int  ->  [rsp+0x34]  
-;  V09 loc7         [V09,T01] (  9, 18   )   byref  ->  [rsp+0x28]  
+;  V09 loc7         [V09,T01] (  8, 17.50)   byref  ->  [rsp+0x28]  
 ;  V10 OutArgs      [V10    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
 ;* V11 tmp1         [V11    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Single-def Box Helper"
 ;  V12 tmp2         [V12,T16] (  2,  2   )     ref  ->  rcx         class-hnd "dup spill"
@@ -931,11 +935,10 @@ G_M18089_IG03:
        mov      r15d, eax
 
 G_M18089_IG04:
-       and      r15d, 0xD1FFAB1E
        mov      ecx, dword ptr [rbx+8]
        mov      eax, r15d
-       cdq      
-       idiv     edx:eax, ecx
+       xor      rdx, rdx
+       div      edx:eax, ecx
        mov      r12d, -1
        cmp      edx, ecx
        jae      G_M18089_IG17
@@ -954,7 +957,7 @@ G_M18089_IG05:
        movsxd   rdx, eax
        shl      rdx, 4
        lea      r10, bword ptr [rbp+rdx+16]
-       cmp      dword ptr [r10], r15d
+       cmp      dword ptr [r10+4], r15d
        jne      SHORT G_M18089_IG08
        mov      rcx, gword ptr [rdi+24]
        test     rcx, rcx
@@ -982,7 +985,7 @@ G_M18089_IG07:
 G_M18089_IG08:
        mov      eax, dword ptr [rsp+34H]
        mov      r12d, eax
-       mov      eax, dword ptr [r10+4]
+       mov      eax, dword ptr [r10]
        mov      r9d, dword ptr [rsp+30H]
        cmp      r9d, r14d
        jle      SHORT G_M18089_IG16
@@ -1010,7 +1013,7 @@ G_M18089_IG11:
 G_M18089_IG12:
        test     r12d, r12d
        jge      SHORT G_M18089_IG13
-       mov      r9d, dword ptr [r10+4]
+       mov      r9d, dword ptr [r10]
        inc      r9d
        mov      dword ptr [rbx+4*r13+16], r9d
        jmp      SHORT G_M18089_IG14
@@ -1021,13 +1024,14 @@ G_M18089_IG13:
        jae      SHORT G_M18089_IG17
        movsxd   rdx, r12d
        shl      rdx, 4
-       mov      ecx, dword ptr [r10+4]
-       mov      dword ptr [rbp+rdx+20], ecx
+       mov      ecx, dword ptr [r10]
+       mov      dword ptr [rbp+rdx+16], ecx
 
 G_M18089_IG14:
-       mov      dword ptr [r10], -1
        mov      edx, dword ptr [rdi+52]
-       mov      dword ptr [r10+4], edx
+       neg      edx
+       add      edx, -3
+       mov      dword ptr [r10], edx
        mov      eax, dword ptr [rsp+34H]
        mov      dword ptr [rdi+52], eax
        inc      dword ptr [rdi+56]
@@ -1053,7 +1057,7 @@ G_M18089_IG17:
        call     CORINFO_HELP_RNGCHKFAIL
        int3     
 
-; Total bytes of code 386, prolog size 21 for method Dictionary`2:Remove(int):bool:this
+; Total bytes of code 375, prolog size 21 for method Dictionary`2:Remove(int):bool:this
 ; ============================================================
 ; Assembly listing for method Dictionary`2:Remove(int,byref):bool:this
 ; Emitting BLENDED_CODE for X64 CPU with SSE2 - Windows
@@ -1073,7 +1077,7 @@ G_M18089_IG17:
 ;  V07 loc4         [V07,T20] (  3,  1.50)     int  ->  rdx        
 ;  V08 loc5         [V08,T09] (  5,  6   )     int  ->  r13        
 ;  V09 loc6         [V09,T00] (  8, 21.50)     int  ->  [rsp+0x44]  
-;  V10 loc7         [V10,T01] ( 10, 18.50)   byref  ->  [rsp+0x28]  
+;  V10 loc7         [V10,T01] (  9, 18   )   byref  ->  [rsp+0x28]  
 ;  V11 OutArgs      [V11    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
 ;* V12 tmp1         [V12    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Single-def Box Helper"
 ;  V13 tmp2         [V13,T17] (  2,  2   )     ref  ->  rcx         class-hnd "dup spill"
@@ -1128,11 +1132,10 @@ G_M18089_IG03:
        mov      r12d, eax
 
 G_M18089_IG04:
-       and      r12d, 0xD1FFAB1E
        mov      ecx, dword ptr [rbp+8]
        mov      eax, r12d
-       cdq      
-       idiv     edx:eax, ecx
+       xor      rdx, rdx
+       div      edx:eax, ecx
        mov      r13d, -1
        cmp      edx, ecx
        jae      G_M18089_IG17
@@ -1152,7 +1155,7 @@ G_M18089_IG05:
        movsxd   rdx, r9d
        shl      rdx, 4
        lea      r11, bword ptr [r14+rdx+16]
-       cmp      dword ptr [r11], r12d
+       cmp      dword ptr [r11+4], r12d
        jne      SHORT G_M18089_IG08
        mov      rcx, gword ptr [rdi+24]
        test     rcx, rcx
@@ -1180,7 +1183,7 @@ G_M18089_IG07:
 G_M18089_IG08:
        mov      r9d, dword ptr [rsp+44H]
        mov      r13d, r9d
-       mov      r9d, dword ptr [r11+4]
+       mov      r9d, dword ptr [r11]
        mov      r10d, dword ptr [rsp+34H]
        cmp      r10d, r15d
        jle      G_M18089_IG16
@@ -1209,7 +1212,7 @@ G_M18089_IG11:
 G_M18089_IG12:
        test     r13d, r13d
        jge      SHORT G_M18089_IG13
-       mov      r10d, dword ptr [r11+4]
+       mov      r10d, dword ptr [r11]
        inc      r10d
        mov      rax, qword ptr [rsp+38H]
        mov      dword ptr [rbp+4*rax+16], r10d
@@ -1221,15 +1224,16 @@ G_M18089_IG13:
        jae      SHORT G_M18089_IG17
        movsxd   rax, r13d
        shl      rax, 4
-       mov      edx, dword ptr [r11+4]
-       mov      dword ptr [r14+rax+20], edx
+       mov      edx, dword ptr [r11]
+       mov      dword ptr [r14+rax+16], edx
 
 G_M18089_IG14:
        mov      eax, dword ptr [r11+12]
        mov      dword ptr [rbx], eax
-       mov      dword ptr [r11], -1
        mov      eax, dword ptr [rdi+52]
-       mov      dword ptr [r11+4], eax
+       neg      eax
+       add      eax, -3
+       mov      dword ptr [r11], eax
        mov      r9d, dword ptr [rsp+44H]
        mov      dword ptr [rdi+52], r9d
        inc      dword ptr [rdi+56]
@@ -1255,7 +1259,7 @@ G_M18089_IG17:
        call     CORINFO_HELP_RNGCHKFAIL
        int3     
 
-; Total bytes of code 419, prolog size 24 for method Dictionary`2:Remove(int,byref):bool:this
+; Total bytes of code 408, prolog size 24 for method Dictionary`2:Remove(int,byref):bool:this
 ; ============================================================
 ; Assembly listing for method Dictionary`2:TryInsert(int,int,ubyte):bool:this
 ; Emitting BLENDED_CODE for X64 CPU with SSE2 - Windows
@@ -1265,25 +1269,25 @@ G_M18089_IG17:
 ; fully interruptible
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T04] ( 22, 14   )     ref  ->  rsi         this class-hnd
-;  V01 arg1         [V01,T09] (  9,  7.50)     int  ->  rdi         ld-addr-op
+;  V00 this         [V00,T04] ( 23, 14.50)     ref  ->  rsi         this class-hnd
+;  V01 arg1         [V01,T07] (  9,  7.50)     int  ->  rdi         ld-addr-op
 ;  V02 arg2         [V02,T15] (  5,  3.50)     int  ->  rbx        
 ;  V03 arg3         [V03,T16] (  4,  3   )   ubyte  ->  rbp        
-;  V04 loc0         [V04,T00] ( 12, 27   )     ref  ->  r14         class-hnd
+;  V04 loc0         [V04,T02] ( 11, 23.50)     ref  ->  r14         class-hnd
 ;  V05 loc1         [V05,T14] (  5,  5.50)     ref  ->  r15         class-hnd
-;  V06 loc2         [V06,T10] (  6, 11   )     int  ->  r12        
-;  V07 loc3         [V07,T01] (  7, 25   )     int  ->  r13        
+;  V06 loc2         [V06,T08] (  6, 11   )     int  ->  r12        
+;  V07 loc3         [V07,T00] (  7, 25   )     int  ->  r13        
 ;  V08 loc4         [V08,T19] (  5,  3.50)   byref  ->  [rsp+0x30]  
-;  V09 loc5         [V09,T02] (  7, 25   )     int  ->  r10        
-;  V10 loc6         [V10,T29] (  3,  1.50)    bool  ->  rbp        
-;  V11 loc7         [V11,T21] (  6,  3   )     int  ->  r13        
-;  V12 loc8         [V12,T20] (  6,  3   )   byref  ->  rax        
+;  V09 loc5         [V09,T01] (  7, 25   )     int  ->  r10        
+;  V10 loc6         [V10,T30] (  3,  1.50)    bool  ->  rbp        
+;  V11 loc7         [V11,T20] (  6,  3   )     int  ->  r13        
+;  V12 loc8         [V12,T26] (  5,  2.50)   byref  ->  rcx        
 ;* V13 loc9         [V13    ] (  0,  0   )     int  ->  zero-ref    ld-addr-op
 ;* V14 loc10        [V14    ] (  0,  0   )     ref  ->  zero-ref    class-hnd
-;  V15 loc11        [V15,T30] (  3,  1.50)     int  ->  r13        
+;  V15 loc11        [V15,T31] (  3,  1.50)     int  ->  r13        
 ;  V16 OutArgs      [V16    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
 ;* V17 tmp1         [V17    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Single-def Box Helper"
-;  V18 tmp2         [V18,T26] (  3,  2   )     int  ->  rax        
+;  V18 tmp2         [V18,T27] (  3,  2   )     int  ->  r12        
 ;* V19 tmp3         [V19    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Single-def Box Helper"
 ;* V20 tmp4         [V20    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Single-def Box Helper"
 ;  V21 tmp5         [V21,T17] (  2,  4   )    bool  ->  rdx         "Inline return value spill temp"
@@ -1293,19 +1297,21 @@ G_M18089_IG17:
 ;* V25 tmp9         [V25    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Single-def Box Helper"
 ;  V26 tmp10        [V26,T12] (  3,  6   )     ref  ->   r8         "arr expr"
 ;  V27 tmp11        [V27,T13] (  3,  6   )     int  ->  rdx         "arr expr"
-;  V28 tmp12        [V28,T27] (  2,  2   )     int  ->  rdx         "argument with side effect"
-;  V29 tmp13        [V29,T23] (  3,  3   )     ref  ->   r8         "arr expr"
-;  V30 tmp14        [V30,T25] (  3,  3   )     int  ->  rdx         "arr expr"
-;  V31 cse0         [V31,T07] (  5, 12.50)   byref  ->  rax         "ValNumCSE"
-;  V32 cse1         [V32,T08] (  5, 12.50)   byref  ->  [rsp+0x28]   "ValNumCSE"
-;  V33 cse2         [V33,T05] (  4, 14   )    long  ->  r10         "ValNumCSE"
-;  V34 cse3         [V34,T06] (  4, 14   )    long  ->  [rsp+0x40]   "ValNumCSE"
-;  V35 cse4         [V35,T24] (  3,  3   )     ref  ->  rcx         "ValNumCSE"
-;  V36 cse5         [V36,T31] (  3,  1.50)     int  ->  rcx         "ValNumCSE"
-;  V37 cse6         [V37,T32] (  3,  1.50)     int  ->  rcx         "ValNumCSE"
-;  V38 cse7         [V38,T28] (  3,  1.50)     ref  ->  rcx         "ValNumCSE"
-;  V39 cse8         [V39,T03] (  7, 21   )     int  ->  [rsp+0x3C]   "ValNumCSE"
-;  V40 cse9         [V40,T22] (  6,  3   )     int  ->  r13         "ValNumCSE"
+;  V28 tmp12        [V28,T28] (  2,  2   )     int  ->  rdx         "argument with side effect"
+;  V29 tmp13        [V29,T22] (  3,  3   )     ref  ->   r8         "arr expr"
+;  V30 tmp14        [V30,T24] (  3,  3   )     int  ->  rdx         "arr expr"
+;  V31 tmp15        [V31,T25] (  3,  3   )     int  ->  rdx         "arr expr"
+;  V32 cse0         [V32,T09] (  4, 10.50)   byref  ->  rax         "ValNumCSE"
+;  V33 cse1         [V33,T10] (  4, 10.50)   byref  ->  [rsp+0x28]   "ValNumCSE"
+;  V34 cse2         [V34,T05] (  3, 12   )    long  ->  r10         "ValNumCSE"
+;  V35 cse3         [V35,T06] (  3, 12   )    long  ->  [rsp+0x40]   "ValNumCSE"
+;  V36 cse4         [V36,T23] (  3,  3   )     ref  ->  rcx         "ValNumCSE"
+;  V37 cse5         [V37,T32] (  3,  1.50)     int  ->  rcx         "ValNumCSE"
+;  V38 cse6         [V38,T33] (  3,  1.50)     int  ->  rcx         "ValNumCSE"
+;  V39 cse7         [V39,T29] (  3,  1.50)     ref  ->  rcx         "ValNumCSE"
+;  V40 cse8         [V40,T03] (  7, 21   )     int  ->  [rsp+0x3C]   "ValNumCSE"
+;  V41 cse9         [V41,T21] (  6,  3   )     int  ->  r13         "ValNumCSE"
+;  V42 cse10        [V42,T34] (  3,  1.50)     int  ->  rax         "ValNumCSE"
 ;
 ; Lcl frame size = 72
 
@@ -1341,20 +1347,19 @@ G_M59125_IG03:
        mov      r11, 0xD1FFAB1E
        cmp      dword ptr [rcx], ecx
        call     [IEqualityComparer`1:GetHashCode(int):int:this]
+       mov      r12d, eax
        jmp      SHORT G_M59125_IG05
 
 G_M59125_IG04:
-       mov      eax, edi
+       mov      r12d, edi
 
 G_M59125_IG05:
-       mov      r12d, eax
-       and      r12d, 0xD1FFAB1E
        xor      r13d, r13d
        mov      rcx, gword ptr [rsi+8]
        mov      r8, rcx
        mov      eax, r12d
-       cdq      
-       idiv     edx:eax, dword ptr [rcx+8]
+       xor      rdx, rdx
+       div      edx:eax, dword ptr [rcx+8]
        cmp      edx, dword ptr [r8+8]
        jae      G_M59125_IG29
        movsxd   rax, edx
@@ -1370,9 +1375,9 @@ G_M59125_IG06:
        jbe      G_M59125_IG18
        movsxd   r10, r10d
        shl      r10, 4
-       cmp      dword ptr [r14+r10+16], r12d
-       jne      SHORT G_M59125_IG07
        lea      rax, bword ptr [r14+r10+16]
+       cmp      dword ptr [rax+4], r12d
+       jne      SHORT G_M59125_IG07
        mov      edx, dword ptr [rax+8]
        cmp      edx, edi
        sete     dl
@@ -1381,8 +1386,7 @@ G_M59125_IG06:
        jne      SHORT G_M59125_IG09
 
 G_M59125_IG07:
-       lea      rax, bword ptr [r14+r10+16]
-       mov      r10d, dword ptr [rax+4]
+       mov      r10d, dword ptr [r14+r10+16]
        cmp      r15d, r13d
        jle      G_M59125_IG26
 
@@ -1425,11 +1429,11 @@ G_M59125_IG13:
        jbe      SHORT G_M59125_IG14
        movsxd   r10, r10d
        shl      r10, 4
-       cmp      dword ptr [r14+r10+16], r12d
-       jne      SHORT G_M59125_IG16
-       mov      bword ptr [rsp+30H], r9
        mov      qword ptr [rsp+40H], r10
        lea      r11, bword ptr [r14+r10+16]
+       cmp      dword ptr [r11+4], r12d
+       jne      SHORT G_M59125_IG16
+       mov      bword ptr [rsp+30H], r9
        mov      bword ptr [rsp+28H], r11
        mov      edx, dword ptr [r11+8]
        mov      rcx, r15
@@ -1439,13 +1443,12 @@ G_M59125_IG13:
        call     [IEqualityComparer`1:Equals(int,int):bool:this]
        test     eax, eax
        mov      r9, bword ptr [rsp+30H]
-       mov      r10, qword ptr [rsp+40H]
        je       SHORT G_M59125_IG16
        movzx    r13, bpl
        cmp      r13d, 1
        jne      SHORT G_M59125_IG15
-       mov      rbp, bword ptr [rsp+28H]
-       mov      dword ptr [rbp+12], ebx
+       mov      r13, bword ptr [rsp+28H]
+       mov      dword ptr [r13+12], ebx
        inc      dword ptr [rsi+60]
        jmp      G_M59125_IG23
 
@@ -1459,8 +1462,8 @@ G_M59125_IG15:
        jmp      G_M59125_IG11
 
 G_M59125_IG16:
-       lea      rcx, bword ptr [r14+r10+16]
-       mov      r10d, dword ptr [rcx+4]
+       mov      r10, qword ptr [rsp+40H]
+       mov      r10d, dword ptr [r14+r10+16]
        mov      eax, dword ptr [rsp+3CH]
        cmp      eax, r13d
        jle      G_M59125_IG28
@@ -1493,8 +1496,8 @@ G_M59125_IG19:
        mov      rcx, gword ptr [rsi+8]
        mov      r8, rcx
        mov      eax, r12d
-       cdq      
-       idiv     edx:eax, dword ptr [rcx+8]
+       xor      rdx, rdx
+       div      edx:eax, dword ptr [rcx+8]
        cmp      edx, dword ptr [r8+8]
        jae      G_M59125_IG29
        movsxd   rax, edx
@@ -1508,23 +1511,31 @@ G_M59125_IG20:
        mov      r14, gword ptr [rsi+16]
 
 G_M59125_IG21:
-       cmp      r13d, dword ptr [r14+8]
+       mov      eax, dword ptr [r14+8]
+       cmp      r13d, eax
        jae      SHORT G_M59125_IG29
-       movsxd   rax, r13d
-       shl      rax, 4
-       lea      rax, bword ptr [r14+rax+16]
+       movsxd   rcx, r13d
+       shl      rcx, 4
+       lea      rcx, bword ptr [r14+rcx+16]
        test     ebp, ebp
        je       SHORT G_M59125_IG22
-       mov      ecx, dword ptr [rax+4]
-       mov      dword ptr [rsi+52], ecx
+       mov      edx, dword ptr [rsi+52]
+       cmp      edx, eax
+       jae      SHORT G_M59125_IG29
+       movsxd   rax, edx
+       shl      rax, 4
+       mov      eax, dword ptr [r14+rax+16]
+       neg      eax
+       add      eax, -3
+       mov      dword ptr [rsi+52], eax
 
 G_M59125_IG22:
-       mov      dword ptr [rax], r12d
-       mov      ecx, dword ptr [r9]
-       dec      ecx
-       mov      dword ptr [rax+4], ecx
-       mov      dword ptr [rax+8], edi
-       mov      dword ptr [rax+12], ebx
+       mov      dword ptr [rcx+4], r12d
+       mov      eax, dword ptr [r9]
+       dec      eax
+       mov      dword ptr [rcx], eax
+       mov      dword ptr [rcx+8], edi
+       mov      dword ptr [rcx+12], ebx
        inc      r13d
        mov      dword ptr [r9], r13d
        inc      dword ptr [rsi+60]
@@ -1566,7 +1577,7 @@ G_M59125_IG29:
        call     CORINFO_HELP_RNGCHKFAIL
        int3     
 
-; Total bytes of code 642, prolog size 27 for method Dictionary`2:TryInsert(int,int,ubyte):bool:this
+; Total bytes of code 653, prolog size 27 for method Dictionary`2:TryInsert(int,int,ubyte):bool:this
 ; ============================================================
 ; Assembly listing for method Dictionary`2:Resize(int,bool):this
 ; Emitting BLENDED_CODE for X64 CPU with SSE2 - Windows
@@ -1576,27 +1587,26 @@ G_M59125_IG29:
 ; fully interruptible
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T04] (  6,  6   )     ref  ->  rsi         this class-hnd
-;  V01 arg1         [V01,T06] (  4,  5   )     int  ->  rdi        
+;  V00 this         [V00,T03] (  6,  6   )     ref  ->  rsi         this class-hnd
+;  V01 arg1         [V01,T05] (  4,  5   )     int  ->  rdi        
 ;* V02 arg2         [V02    ] (  0,  0   )    bool  ->  zero-ref   
-;  V03 loc0         [V03,T05] (  5,  8   )     ref  ->  rbp         class-hnd
-;  V04 loc1         [V04,T01] (  6, 10   )     ref  ->  rbx         class-hnd
-;  V05 loc2         [V05,T07] (  4,  7   )     int  ->  r14        
+;  V03 loc0         [V03,T04] (  5,  8   )     ref  ->  rbp         class-hnd
+;  V04 loc1         [V04,T01] (  7, 12   )     ref  ->  rbx         class-hnd
+;  V05 loc2         [V05,T06] (  4,  7   )     int  ->  r14        
 ;* V06 loc3         [V06    ] (  0,  0   )     int  ->  zero-ref    ld-addr-op
 ;* V07 loc4         [V07    ] (  0,  0   )     int  ->  zero-ref   
 ;  V08 loc5         [V08,T00] (  7, 23   )     int  ->  r15        
-;  V09 loc6         [V09,T08] (  3,  6   )     int  ->  rdx        
+;  V09 loc6         [V09,T07] (  3,  6   )     int  ->  rdx        
 ;  V10 OutArgs      [V10    ] (  1,  1   )  lclBlk (48) [rsp+0x00]   "OutgoingArgSpace"
 ;* V11 tmp1         [V11    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Single-def Box Helper"
 ;* V12 tmp2         [V12    ] (  0,  0   )   byref  ->  zero-ref    "impAppendStmt"
-;  V13 tmp3         [V13,T11] (  2,  4   )     ref  ->  rcx         class-hnd "Inlining Arg"
+;  V13 tmp3         [V13,T10] (  2,  4   )     ref  ->  rcx         class-hnd "Inlining Arg"
 ;* V14 tmp4         [V14    ] (  0,  0   )   byref  ->  zero-ref    "Inlining Arg"
-;  V15 cse0         [V15,T02] (  3, 10   )     int  ->  rax         "ValNumCSE"
-;  V16 cse1         [V16,T03] (  3, 10   )    long  ->   r8         "ValNumCSE"
-;  V17 cse2         [V17,T10] (  2,  5   )     int  ->  rcx         "ValNumCSE"
-;  V18 cse3         [V18,T12] (  2,  4   )     int  ->  rax         "ValNumCSE"
-;  V19 cse4         [V19,T09] (  3,  6   )    long  ->  rax         "ValNumCSE"
-;  V20 cse5         [V20,T13] (  3,  3   )    long  ->  rbx         "ValNumCSE"
+;  V15 cse0         [V15,T02] (  4, 12   )    long  ->   r8         "ValNumCSE"
+;  V16 cse1         [V16,T09] (  2,  5   )     int  ->  rcx         "ValNumCSE"
+;  V17 cse2         [V17,T11] (  2,  4   )     int  ->  rax         "ValNumCSE"
+;  V18 cse3         [V18,T08] (  3,  6   )    long  ->  rax         "ValNumCSE"
+;  V19 cse4         [V19,T12] (  3,  3   )    long  ->  rbx         "ValNumCSE"
 ;
 ; Lcl frame size = 56
 
@@ -1640,18 +1650,18 @@ G_M14072_IG03:
        jae      SHORT G_M14072_IG07
        movsxd   r8, r15d
        shl      r8, 4
-       mov      eax, dword ptr [rbx+r8+16]
-       test     eax, eax
+       cmp      dword ptr [rbx+r8+16], -1
        jl       SHORT G_M14072_IG04
-       cdq      
-       idiv     edx:eax, edi
+       mov      eax, dword ptr [rbx+r8+20]
+       xor      rdx, rdx
+       div      edx:eax, edi
        mov      eax, dword ptr [rbp+8]
        cmp      edx, eax
        jae      SHORT G_M14072_IG07
        movsxd   rax, edx
        mov      edx, dword ptr [rbp+4*rax+16]
        dec      edx
-       mov      dword ptr [rbx+r8+20], edx
+       mov      dword ptr [rbx+r8+16], edx
        lea      edx, [r15+1]
        mov      dword ptr [rbp+4*rax+16], edx
 
@@ -1683,7 +1693,7 @@ G_M14072_IG07:
        call     CORINFO_HELP_RNGCHKFAIL
        int3     
 
-; Total bytes of code 212, prolog size 17 for method Dictionary`2:Resize(int,bool):this
+; Total bytes of code 217, prolog size 17 for method Dictionary`2:Resize(int,bool):this
 ; ============================================================
 ; Assembly listing for method Dictionary`2:TrimExcess(int):this
 ; Emitting BLENDED_CODE for X64 CPU with SSE2 - Windows
@@ -1693,35 +1703,37 @@ G_M14072_IG07:
 ; fully interruptible
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T01] ( 13,  9   )     ref  ->  rsi         this class-hnd
+;  V00 this         [V00,T02] ( 13,  9   )     ref  ->  rsi         this class-hnd
 ;  V01 arg1         [V01,T08] (  4,  4   )     int  ->  rdx        
-;  V02 loc0         [V02,T13] (  4,  4.50)     int  ->  rdi        
-;  V03 loc1         [V03,T04] (  6,  9   )     ref  ->  rbx         class-hnd
+;  V02 loc0         [V02,T14] (  4,  4.50)     int  ->  rdi        
+;  V03 loc1         [V03,T03] (  6, 11   )     ref  ->  rbx         class-hnd
 ;* V04 loc2         [V04    ] (  0,  0   )     int  ->  zero-ref   
-;  V05 loc3         [V05,T12] (  3,  5   )     int  ->  rbp        
-;  V06 loc4         [V06,T14] (  3,  4.50)     ref  ->  rcx         class-hnd
+;  V05 loc3         [V05,T13] (  3,  5   )     int  ->  rbp        
+;  V06 loc4         [V06,T15] (  3,  4.50)     ref  ->  rcx         class-hnd
 ;  V07 loc5         [V07,T07] (  4,  6.50)     ref  ->   r8         class-hnd
 ;  V08 loc6         [V08,T05] (  6,  9   )     int  ->   r9        
 ;  V09 loc7         [V09,T00] (  6, 20.50)     int  ->  r10        
-;  V10 loc8         [V10,T02] (  3, 10   )     int  ->  r11        
-;  V11 loc9         [V11,T09] (  3,  6   )   byref  ->  r14        
+;  V10 loc8         [V10,T11] (  2,  6   )     int  ->  r14        
+;  V11 loc9         [V11,T09] (  3,  6   )   byref  ->  r15        
 ;  V12 loc10        [V12,T06] (  4,  8   )     int  ->  rdx        
 ;  V13 OutArgs      [V13    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
-;  V14 tmp1         [V14,T16] (  3,  2   )     int  ->  rbp        
-;  V15 cse0         [V15,T03] (  3, 10   )    long  ->  rdx         "ValNumCSE"
-;  V16 cse1         [V16,T11] (  4,  5.50)     int  ->  [rsp+0x2C]   "ValNumCSE"
-;  V17 cse2         [V17,T15] (  2,  4   )     int  ->  rax         "ValNumCSE"
-;  V18 cse3         [V18,T10] (  3,  6   )     int  ->   r9         "ValNumCSE"
+;  V14 tmp1         [V14,T17] (  3,  2   )     int  ->  rbp        
+;  V15 cse0         [V15,T04] (  3, 10   )   byref  ->  r11         "ValNumCSE"
+;  V16 cse1         [V16,T01] (  3, 12   )    long  ->  rdx         "ValNumCSE"
+;  V17 cse2         [V17,T12] (  4,  5.50)     int  ->  [rsp+0x24]   "ValNumCSE"
+;  V18 cse3         [V18,T16] (  2,  4   )     int  ->  rax         "ValNumCSE"
+;  V19 cse4         [V19,T10] (  3,  6   )     int  ->   r9         "ValNumCSE"
 ;
-; Lcl frame size = 48
+; Lcl frame size = 40
 
 G_M47871_IG01:
+       push     r15
        push     r14
        push     rdi
        push     rsi
        push     rbp
        push     rbx
-       sub      rsp, 48
+       sub      rsp, 40
        mov      rsi, rcx
 
 G_M47871_IG02:
@@ -1748,12 +1760,13 @@ G_M47871_IG05:
        jl       SHORT G_M47871_IG07
 
 G_M47871_IG06:
-       add      rsp, 48
+       add      rsp, 40
        pop      rbx
        pop      rbp
        pop      rsi
        pop      rdi
        pop      r14
+       pop      r15
        ret      
 
 G_M47871_IG07:
@@ -1771,32 +1784,32 @@ G_M47871_IG07:
        mov      eax, dword ptr [rbx+8]
 
 G_M47871_IG08:
-       mov      dword ptr [rsp+2CH], eax
+       mov      dword ptr [rsp+24H], eax
        cmp      r10d, eax
        jae      G_M47871_IG13
        movsxd   rdx, r10d
        shl      rdx, 4
-       mov      r11d, dword ptr [rbx+rdx+16]
-       test     r11d, r11d
+       lea      r11, bword ptr [rbx+rdx+16]
+       mov      r14d, dword ptr [r11+4]
+       cmp      dword ptr [rbx+rdx+16], -1
        jl       SHORT G_M47871_IG09
        cmp      r9d, dword ptr [rcx+8]
        jae      SHORT G_M47871_IG13
-       movsxd   r14, r9d
-       shl      r14, 4
-       lea      r14, bword ptr [rcx+r14+16]
-       lea      rdx, bword ptr [rbx+rdx+16]
-       movdqu   xmm0, qword ptr [rdx]
-       movdqu   qword ptr [r14], xmm0
-       mov      eax, r11d
-       cdq      
-       idiv     edx:eax, edi
+       movsxd   rdx, r9d
+       shl      rdx, 4
+       lea      r15, bword ptr [rcx+rdx+16]
+       movdqu   xmm0, qword ptr [r11]
+       movdqu   qword ptr [r15], xmm0
+       mov      eax, r14d
+       xor      rdx, rdx
+       div      edx:eax, edi
        mov      eax, dword ptr [r8+8]
        cmp      edx, eax
        jae      SHORT G_M47871_IG13
        movsxd   rax, edx
        mov      eax, dword ptr [r8+4*rax+16]
        dec      eax
-       mov      dword ptr [r14+4], eax
+       mov      dword ptr [r15], eax
        movsxd   rax, edx
        inc      r9d
        mov      dword ptr [r8+4*rax+16], r9d
@@ -1804,7 +1817,7 @@ G_M47871_IG08:
 G_M47871_IG09:
        inc      r10d
        cmp      r10d, ebp
-       mov      eax, dword ptr [rsp+2CH]
+       mov      eax, dword ptr [rsp+24H]
        jl       SHORT G_M47871_IG08
 
 G_M47871_IG10:
@@ -1813,12 +1826,13 @@ G_M47871_IG10:
        mov      dword ptr [rsi+56], r10d
 
 G_M47871_IG11:
-       add      rsp, 48
+       add      rsp, 40
        pop      rbx
        pop      rbp
        pop      rsi
        pop      rdi
        pop      r14
+       pop      r15
        ret      
 
 G_M47871_IG12:
@@ -1830,5 +1844,5 @@ G_M47871_IG13:
        call     CORINFO_HELP_RNGCHKFAIL
        int3     
 
-; Total bytes of code 256, prolog size 13 for method Dictionary`2:TrimExcess(int):this
+; Total bytes of code 264, prolog size 15 for method Dictionary`2:TrimExcess(int):this
 ; ============================================================

@jkotas
Copy link
Member

jkotas commented Mar 26, 2019

A lot of tests are failing with:

shouldn't underflow because max hashtable length is MaxPrimeArrayLength = 0x7FEFFFFD

   at System.Collections.Generic.Dictionary`2.Remove(TKey key)

@jkotas
Copy link
Member

jkotas commented Mar 26, 2019

The diffs looks pretty good so far. Could you please also look at TryInsert? That one is missing in the diffs from the basic find/add/resize/remove operations on dictionary.

@MarcoRossignoli
Copy link
Member Author

MarcoRossignoli commented Mar 26, 2019

A lot of tests are failing with:...

Thank's I'll fix it, I was confused by "no more runnable" failed tests 😞

@danmoseley danmoseley changed the title DictionarySlim backport improvements, retaining more entropy and remove _freeCount DictionarySlim backport improvements, retaining more entropy Mar 27, 2019
@MarcoRossignoli
Copy link
Member Author

/azp run

@MarcoRossignoli
Copy link
Member Author

X86 diff I see similar difference with some different registry used(I think due to constant removal)

diff --git a/x86coreclrupstream.txt b/x86coreclr.txt
index 17de20b..5c1cdec 100644
--- a/x86coreclrupstream.txt
+++ b/x86coreclr.txt
@@ -6,43 +6,43 @@
 ; fully interruptible
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T03] ( 34, 21.50)     ref  ->  esi         this class-hnd
+;  V00 this         [V00,T03] ( 35, 22   )     ref  ->  esi         this class-hnd
 ;  V01 arg1         [V01,T08] ( 11,  9   )     ref  ->  [ebp-0x2C]   ld-addr-op class-hnd
-;  V02 arg2         [V02,T31] (  3,  1.50)     ref  ->  [ebp+0x0C]   class-hnd
-;  V03 arg3         [V03,T27] (  4,  2   )   ubyte  ->  [ebp+0x08]  
-;  V04 loc0         [V04,T04] ( 11, 23   )     ref  ->  [ebp-0x30]   class-hnd
+;  V02 arg2         [V02,T32] (  3,  1.50)     ref  ->  [ebp+0x0C]   class-hnd
+;  V03 arg3         [V03,T28] (  4,  2   )   ubyte  ->  [ebp+0x08]  
+;  V04 loc0         [V04,T04] ( 13, 24   )     ref  ->  [ebp-0x30]   class-hnd
 ;  V05 loc1         [V05,T13] (  6,  6   )     ref  ->  [ebp-0x34]   class-hnd
 ;  V06 loc2         [V06,T09] (  6, 11   )     int  ->  [ebp-0x10]  
 ;  V07 loc3         [V07,T01] (  8, 25.50)     int  ->  [ebp-0x14]  
 ;  V08 loc4         [V08,T18] (  5,  3.50)   byref  ->  [ebp-0x38]  
 ;  V09 loc5         [V09,T00] (  9, 29   )     int  ->  [ebp-0x18]  
-;  V10 loc6         [V10,T32] (  3,  1.50)    bool  ->  [ebp-0x1C]  
-;  V11 loc7         [V11,T22] (  6,  3   )     int  ->  ecx        
-;  V12 loc8         [V12,T21] (  6,  3   )   byref  ->  [ebp-0x3C]  
+;  V10 loc6         [V10,T33] (  3,  1.50)    bool  ->  [ebp-0x1C]  
+;  V11 loc7         [V11,T21] (  6,  3   )     int  ->  ecx        
+;  V12 loc8         [V12,T25] (  5,  2.50)   byref  ->  [ebp-0x3C]  
 ;* V13 loc9         [V13    ] (  0,  0   )     ref  ->  zero-ref    ld-addr-op class-hnd
 ;  V14 loc10        [V14,T16] (  3,  4.50)     ref  ->  [ebp-0x40]   class-hnd
-;  V15 loc11        [V15,T33] (  3,  1.50)     int  ->  [ebp-0x20]  
-;  V16 tmp0         [V16,T28] (  3,  2   )     int  ->  eax        
+;  V15 loc11        [V15,T34] (  3,  1.50)     int  ->  [ebp-0x20]  
+;  V16 tmp0         [V16,T29] (  3,  2   )     int  ->  ebx        
 ;  V17 tmp1         [V17,T17] (  5,  3.74)     ref  ->  edi         class-hnd "spilling QMark2"
 ;  V18 tmp2         [V18,T10] (  3, 10   )     int  ->  eax         "impRuntimeLookup slot"
 ;  V19 tmp3         [V19,T11] (  2,  8   )     ref  ->  [ebp-0x44]   class-hnd "impAppendStmt"
 ;* V20 tmp4         [V20    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "bubbling QMark1"
 ;  V21 tmp5         [V21,T05] (  4, 14   )     int  ->  ebx         "impRuntimeLookup typehandle"
 ;* V22 tmp6         [V22    ] (  0,  0   )     int  ->  zero-ref    "VirtualCall with runtime lookup"
-;  V23 tmp7         [V23,T36] (  3,  0   )     int  ->  ecx         "impRuntimeLookup slot"
+;  V23 tmp7         [V23,T37] (  3,  0   )     int  ->  ecx         "impRuntimeLookup slot"
 ;* V24 tmp8         [V24    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "bubbling QMark1"
-;  V25 tmp9         [V25,T34] (  4,  0   )     int  ->  edx         "impRuntimeLookup typehandle"
-;  V26 tmp10        [V26,T25] (  3,  2.50)     int  ->  [ebp-0x24]   "impRuntimeLookup slot"
+;  V25 tmp9         [V25,T35] (  4,  0   )     int  ->  edx         "impRuntimeLookup typehandle"
+;  V26 tmp10        [V26,T26] (  3,  2.50)     int  ->  [ebp-0x24]   "impRuntimeLookup slot"
 ;  V27 tmp11        [V27,T19] (  4,  3.50)     int  ->  edi         "impRuntimeLookup typehandle"
-;  V28 tmp12        [V28,T37] (  3,  0   )     int  ->  ecx         "impRuntimeLookup slot"
+;  V28 tmp12        [V28,T38] (  3,  0   )     int  ->  ecx         "impRuntimeLookup slot"
 ;* V29 tmp13        [V29    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "bubbling QMark1"
-;  V30 tmp14        [V30,T35] (  4,  0   )     int  ->  edx         "impRuntimeLookup typehandle"
+;  V30 tmp14        [V30,T36] (  4,  0   )     int  ->  edx         "impRuntimeLookup typehandle"
 ;* V31 tmp15        [V31    ] (  0,  0   )     int  ->  zero-ref    "impRuntimeLookup slot"
 ;* V32 tmp16        [V32    ] (  0,  0   )     int  ->  zero-ref    "impRuntimeLookup typehandle"
 ;* V33 tmp17        [V33    ] (  0,  0   )     int  ->  zero-ref    "impRuntimeLookup slot"
 ;* V34 tmp18        [V34    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "bubbling QMark1"
 ;* V35 tmp19        [V35    ] (  0,  0   )     int  ->  zero-ref    "impRuntimeLookup typehandle"
-;  V36 tmp20        [V36,T26] (  3,  2.50)     int  ->  ebx         "impRuntimeLookup slot"
+;  V36 tmp20        [V36,T27] (  3,  2.50)     int  ->  ebx         "impRuntimeLookup slot"
 ;* V37 tmp21        [V37    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "bubbling QMark1"
 ;  V38 tmp22        [V38,T20] (  4,  3.50)     int  ->  eax         "impRuntimeLookup typehandle"
 ;* V39 tmp23        [V39    ] (  0,  0   )     int  ->  zero-ref    "VirtualCall with runtime lookup"
@@ -51,13 +51,14 @@
 ;* V42 tmp26        [V42    ] (  0,  0   )     ref  ->  zero-ref    "argument with side effect"
 ;* V43 tmp27        [V43    ] (  0,  0   )     ref  ->  zero-ref    "argument with side effect"
 ;  V44 tmp28        [V44,T12] (  2,  8   )     ref  ->  [ebp-0x4C]   "argument with side effect"
-;  V45 tmp29        [V45,T29] (  2,  2   )     int  ->  edx         "argument with side effect"
-;  V46 tmp30        [V46,T23] (  3,  3   )     ref  ->  ecx         "arr expr"
-;  V47 tmp31        [V47,T24] (  3,  3   )     int  ->  edx         "arr expr"
-;  V48 tmp32        [V48,T30] (  2,  2   )     int  ->  edx         "argument with side effect"
-;  V49 cse0         [V49,T06] (  4, 12.50)   byref  ->  edi         "ValNumCSE"
-;  V50 cse1         [V50,T07] (  4, 12.50)   byref  ->  [ebp-0x50]   "ValNumCSE"
-;  V51 cse2         [V51,T02] (  7, 24.50)     int  ->  [ebp-0x28]   "ValNumCSE"
+;  V45 tmp29        [V45,T30] (  2,  2   )     int  ->  edx         "argument with side effect"
+;  V46 tmp30        [V46,T22] (  3,  3   )     ref  ->  ecx         "arr expr"
+;  V47 tmp31        [V47,T23] (  3,  3   )     int  ->  edx         "arr expr"
+;  V48 tmp32        [V48,T24] (  3,  3   )     int  ->  edi         "arr expr"
+;  V49 tmp33        [V49,T31] (  2,  2   )     int  ->  edx         "argument with side effect"
+;  V50 cse0         [V50,T06] (  4, 12.50)   byref  ->  edi         "ValNumCSE"
+;  V51 cse1         [V51,T07] (  4, 12.50)   byref  ->  [ebp-0x50]   "ValNumCSE"
+;  V52 cse2         [V52,T02] (  7, 24.50)     int  ->  [ebp-0x28]   "ValNumCSE"
 ;  TEMP_02                                     ref  ->  [ebp-0x54]
 ;  TEMP_01                                     int  ->  [ebp-0x58]
 ;
@@ -108,6 +109,7 @@ G_M9942_IG05:
        mov      edx, edi
        nop      
        call     dword ptr [eax]
+       mov      ebx, eax
        mov      gword ptr [ebp-2CH], edi
        jmp      SHORT G_M9942_IG07
 
@@ -117,18 +119,17 @@ G_M9942_IG06:
        mov      ebx, dword ptr [edi]
        mov      ebx, dword ptr [ebx+40]
        call     dword ptr [ebx+12]Object:GetHashCode():int:this
+       mov      ebx, eax
 
 G_M9942_IG07:
-       mov      ebx, eax
-       and      ebx, 0xD1FFAB1E
        xor      ecx, ecx
        mov      dword ptr [ebp-14H], ecx
        mov      edx, gword ptr [esi+4]
        mov      gword ptr [ebp-48H], edx
        mov      edi, gword ptr [esi+4]
        mov      eax, ebx
-       cdq      
-       idiv     edx:eax, dword ptr [edi+4]
+       xor      edx, edx
+       div      edx:eax, dword ptr [edi+4]
        mov      edi, gword ptr [ebp-48H]
        cmp      edx, dword ptr [edi+4]
        jae      G_M9942_IG39
@@ -181,7 +182,7 @@ G_M9942_IG12:
        shl      edi, 4
        lea      edi, bword ptr [eax+edi+8]
        mov      dword ptr [ebp-10H], ebx
-       cmp      dword ptr [edi+8], ebx
+       cmp      dword ptr [edi+12], ebx
        jne      SHORT G_M9942_IG13
        shl      edx, 4
        mov      gword ptr [ebp-30H], eax
@@ -204,7 +205,7 @@ G_M9942_IG12:
        jne      SHORT G_M9942_IG15
 
 G_M9942_IG13:
-       mov      edx, dword ptr [edi+12]
+       mov      edx, dword ptr [edi+8]
        mov      edi, edx
        mov      ecx, dword ptr [ebp-28H]
        mov      edx, dword ptr [ebp-14H]
@@ -246,7 +247,7 @@ G_M9942_IG17:
        lea      ecx, bword ptr [edi+ecx+8]
        mov      bword ptr [ebp-50H], ecx
        mov      dword ptr [ebp-10H], ebx
-       cmp      dword ptr [ecx+8], ebx
+       cmp      dword ptr [ecx+12], ebx
        jne      SHORT G_M9942_IG19
        shl      eax, 4
        mov      eax, gword ptr [edi+eax+8]
@@ -298,7 +299,7 @@ G_M9942_IG21:
 
 G_M9942_IG22:
        mov      ecx, bword ptr [ebp-50H]
-       mov      eax, dword ptr [ecx+12]
+       mov      eax, dword ptr [ecx+8]
        mov      ecx, eax
        mov      edx, dword ptr [ebp-28H]
        mov      eax, dword ptr [ebp-14H]
@@ -337,8 +338,8 @@ G_M9942_IG25:
        mov      ecx, gword ptr [esi+4]
        mov      edi, gword ptr [esi+4]
        mov      eax, ebx
-       cdq      
-       idiv     edx:eax, dword ptr [edi+4]
+       xor      edx, edx
+       div      edx:eax, dword ptr [edi+4]
        cmp      edx, dword ptr [ecx+4]
        jae      G_M9942_IG39
        lea      edi, bword ptr [ecx+4*edx+8]
@@ -362,15 +363,21 @@ G_M9942_IG27:
        mov      edi, dword ptr [ebp-1CH]
        test     edi, edi
        je       SHORT G_M9942_IG28
-       mov      edi, dword ptr [edx+12]
+       mov      edi, dword ptr [esi+28]
+       cmp      edi, dword ptr [eax+4]
+       jae      G_M9942_IG39
+       shl      edi, 4
+       mov      edi, dword ptr [eax+edi+16]
+       neg      edi
+       add      edi, -3
        mov      dword ptr [esi+28], edi
 
 G_M9942_IG28:
-       mov      dword ptr [edx+8], ebx
+       mov      dword ptr [edx+12], ebx
        mov      edi, bword ptr [ebp-38H]
        mov      ebx, dword ptr [edi]
        dec      ebx
-       mov      dword ptr [edx+12], ebx
+       mov      dword ptr [edx+8], ebx
        mov      bword ptr [ebp-3CH], edx
        mov      ebx, gword ptr [ebp-2CH]
        call     CORINFO_HELP_CHECKED_ASSIGN_REF_EBX
@@ -460,7 +467,7 @@ G_M9942_IG39:
        call     CORINFO_HELP_RNGCHKFAIL
        int3     
 
-; Total bytes of code 923, prolog size 18 for method Dictionary`2:TryInsert(ref,ref,ubyte):bool:this
+; Total bytes of code 942, prolog size 18 for method Dictionary`2:TryInsert(ref,ref,ubyte):bool:this
 ; ============================================================
 ; Assembly listing for method Dictionary`2:Resize(int,bool):this
 ; Emitting BLENDED_CODE for generic X86 CPU - Windows
@@ -470,25 +477,24 @@ G_M9942_IG39:
 ; fully interruptible
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T05] (  8,  8   )     ref  ->  [ebp-0x1C]   this class-hnd
-;  V01 arg1         [V01,T10] (  5,  6   )     int  ->  [ebp-0x10]  
-;  V02 arg2         [V02,T17] (  1,  1   )    bool  ->  [ebp+0x08]  
-;  V03 loc0         [V03,T09] (  5,  8   )     ref  ->  ebx         class-hnd
+;  V00 this         [V00,T06] (  8,  8   )     ref  ->  [ebp-0x1C]   this class-hnd
+;  V01 arg1         [V01,T09] (  5,  6   )     int  ->  [ebp-0x10]  
+;  V02 arg2         [V02,T16] (  1,  1   )    bool  ->  [ebp+0x08]  
+;  V03 loc0         [V03,T08] (  5,  8   )     ref  ->  ebx         class-hnd
 ;  V04 loc1         [V04,T02] (  8, 21   )     ref  ->  [ebp-0x20]   class-hnd
-;  V05 loc2         [V05,T04] (  6, 11.50)     int  ->  [ebp-0x14]  
+;  V05 loc2         [V05,T05] (  6, 11.50)     int  ->  [ebp-0x14]  
 ;* V06 loc3         [V06    ] (  0,  0   )     ref  ->  zero-ref    ld-addr-op class-hnd
 ;  V07 loc4         [V07,T01] (  7, 22.50)     int  ->  [ebp-0x18]  
 ;  V08 loc5         [V08,T00] (  7, 23   )     int  ->  edi        
-;  V09 loc6         [V09,T11] (  4,  8   )     int  ->  edx        
-;  V10 tmp0         [V10,T14] (  3,  4.50)     int  ->  ecx         "impRuntimeLookup slot"
-;  V11 tmp1         [V11,T13] (  4,  6.50)     int  ->  eax         "impRuntimeLookup typehandle"
-;  V12 tmp2         [V12,T12] (  2,  8   )   byref  ->  edi         "non-inline candidate call"
-;  V13 tmp3         [V13,T15] (  2,  4   )     ref  ->  ecx         class-hnd "Inlining Arg"
-;  V14 cse0         [V14,T08] (  3, 10   )     int  ->  esi         "ValNumCSE"
-;  V15 cse1         [V15,T06] (  3, 10   )   byref  ->  [ebp-0x24]   "ValNumCSE"
-;  V16 cse2         [V16,T07] (  3, 10   )   byref  ->  edi         "ValNumCSE"
-;  V17 cse3         [V17,T16] (  2,  4   )     int  ->  eax         "ValNumCSE"
-;  V18 rat0         [V18,T03] (  3, 12   )     ref  ->  esi         "virtual vtable call"
+;  V09 loc6         [V09,T10] (  4,  8   )     int  ->  edx        
+;  V10 tmp0         [V10,T13] (  3,  4.50)     int  ->  ecx         "impRuntimeLookup slot"
+;  V11 tmp1         [V11,T12] (  4,  6.50)     int  ->  eax         "impRuntimeLookup typehandle"
+;  V12 tmp2         [V12,T11] (  2,  8   )   byref  ->  edi         "non-inline candidate call"
+;  V13 tmp3         [V13,T14] (  2,  4   )     ref  ->  ecx         class-hnd "Inlining Arg"
+;  V14 cse0         [V14,T03] (  4, 12   )   byref  ->  [ebp-0x24]   "ValNumCSE"
+;  V15 cse1         [V15,T07] (  3, 10   )   byref  ->  edi         "ValNumCSE"
+;  V16 cse2         [V16,T15] (  2,  4   )     int  ->  eax         "ValNumCSE"
+;  V17 rat0         [V17,T04] (  3, 12   )     ref  ->  esi         "virtual vtable call"
 ;
 ; Lcl frame size = 24
 
@@ -547,7 +553,7 @@ G_M29783_IG04:
        mov      edi, eax
        shl      edi, 4
        lea      edi, bword ptr [ecx+edi+8]
-       cmp      dword ptr [edi+8], 0
+       cmp      dword ptr [edi+8], -1
        jl       SHORT G_M29783_IG05
        mov      dword ptr [ebp-18H], eax
        mov      esi, eax
@@ -558,8 +564,7 @@ G_M29783_IG04:
        mov      esi, dword ptr [esi]
        mov      esi, dword ptr [esi+40]
        call     dword ptr [esi+12]Object:GetHashCode():int:this
-       and      eax, 0xD1FFAB1E
-       mov      dword ptr [edi+8], eax
+       mov      dword ptr [edi+12], eax
        mov      eax, dword ptr [ebp-18H]
        mov      ecx, gword ptr [ebp-20H]
 
@@ -581,20 +586,19 @@ G_M29783_IG07:
        mov      eax, edi
        shl      eax, 4
        lea      eax, bword ptr [ecx+eax+8]
-       mov      bword ptr [ebp-24H], eax
-       mov      esi, dword ptr [eax+8]
-       test     esi, esi
+       cmp      dword ptr [eax+8], -1
        jl       SHORT G_M29783_IG08
-       mov      eax, esi
-       cdq      
-       idiv     edx:eax, dword ptr [ebp-10H]
+       mov      bword ptr [ebp-24H], eax
+       mov      eax, dword ptr [eax+12]
+       xor      edx, edx
+       div      edx:eax, dword ptr [ebp-10H]
        mov      eax, dword ptr [ebx+4]
        cmp      edx, eax
        jae      SHORT G_M29783_IG14
        mov      eax, dword ptr [ebx+4*edx+8]
        dec      eax
        mov      esi, bword ptr [ebp-24H]
-       mov      dword ptr [esi+12], eax
+       mov      dword ptr [esi+8], eax
        lea      eax, [edi+1]
        mov      dword ptr [ebx+4*edx+8], eax
 
@@ -639,7 +643,7 @@ G_M29783_IG14:
        call     CORINFO_HELP_RNGCHKFAIL
        int3     
 
-; Total bytes of code 332, prolog size 13 for method Dictionary`2:Resize(int,bool):this
+; Total bytes of code 328, prolog size 13 for method Dictionary`2:Resize(int,bool):this
 ; ============================================================
 ; Assembly listing for method Dictionary`2:FindEntry(ref):int:this
 ; Emitting BLENDED_CODE for generic X86 CPU - Windows
@@ -720,12 +724,11 @@ G_M16827_IG03:
        mov      ebx, dword ptr [ebx+40]
        call     dword ptr [ebx+12]Object:GetHashCode():int:this
        mov      ebx, eax
-       and      ebx, 0xD1FFAB1E
        mov      dword ptr [ebp-18H], ebx
        mov      ecx, gword ptr [ebp-38H]
        mov      eax, ebx
-       cdq      
-       idiv     edx:eax, dword ptr [ecx+4]
+       xor      edx, edx
+       div      edx:eax, dword ptr [ecx+4]
        cmp      edx, dword ptr [ecx+4]
        jae      G_M16827_IG19
        mov      ecx, dword ptr [ecx+4*edx+8]
@@ -759,7 +762,7 @@ G_M16827_IG05:
        shl      ebx, 4
        lea      ebx, bword ptr [ecx+ebx+8]
        mov      edi, dword ptr [ebp-18H]
-       cmp      dword ptr [ebx+8], edi
+       cmp      dword ptr [ebx+12], edi
        jne      SHORT G_M16827_IG06
        mov      dword ptr [ebp-10H], eax
        mov      edi, eax
@@ -779,7 +782,7 @@ G_M16827_IG05:
        jne      G_M16827_IG15
 
 G_M16827_IG06:
-       mov      eax, dword ptr [ebx+12]
+       mov      eax, dword ptr [ebx+8]
        mov      ebx, eax
        mov      edx, dword ptr [ebp-2CH]
        mov      eax, dword ptr [ebp-14H]
@@ -820,12 +823,11 @@ G_M16827_IG10:
        nop      
        call     dword ptr [eax]
        mov      ebx, eax
-       and      ebx, 0xD1FFAB1E
        mov      dword ptr [ebp-1CH], ebx
        mov      ecx, gword ptr [ebp-38H]
        mov      eax, ebx
-       cdq      
-       idiv     edx:eax, dword ptr [ecx+4]
+       xor      edx, edx
+       div      edx:eax, dword ptr [ecx+4]
        cmp      edx, dword ptr [ecx+4]
        jae      G_M16827_IG19
        mov      ecx, dword ptr [ecx+4*edx+8]
@@ -842,7 +844,7 @@ G_M16827_IG11:
        shl      ebx, 4
        lea      ebx, bword ptr [edx+ebx+8]
        mov      edi, dword ptr [ebp-1CH]
-       cmp      dword ptr [ebx+8], edi
+       cmp      dword ptr [ebx+12], edi
        jne      SHORT G_M16827_IG13
        mov      dword ptr [ebp-10H], eax
        mov      edi, eax
@@ -874,7 +876,7 @@ G_M16827_IG12:
        jne      SHORT G_M16827_IG15
 
 G_M16827_IG13:
-       mov      eax, dword ptr [ebx+12]
+       mov      eax, dword ptr [ebx+8]
        mov      ebx, eax
        mov      ecx, dword ptr [ebp-2CH]
        mov      eax, dword ptr [ebp-14H]
@@ -913,7 +915,7 @@ G_M16827_IG19:
        call     CORINFO_HELP_RNGCHKFAIL
        int3     
 
-; Total bytes of code 535, prolog size 13 for method Dictionary`2:FindEntry(ref):int:this
+; Total bytes of code 525, prolog size 13 for method Dictionary`2:FindEntry(ref):int:this
 ; ============================================================
 ; Assembly listing for method Dictionary`2:Remove(int):bool:this
 ; Emitting BLENDED_CODE for generic X86 CPU - Windows
@@ -932,7 +934,7 @@ G_M16827_IG19:
 ;  V06 loc4         [V06,T17] (  4,  2   )     int  ->  [ebp-0x1C]  
 ;  V07 loc5         [V07,T10] (  5,  6   )     int  ->  [ebp-0x20]  
 ;  V08 loc6         [V08,T00] (  8, 21.50)     int  ->  [ebp-0x24]  
-;  V09 loc7         [V09,T01] (  9, 18   )   byref  ->  [ebp-0x38]  
+;  V09 loc7         [V09,T01] (  8, 17.50)   byref  ->  [ebp-0x38]  
 ;* V10 tmp0         [V10    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Single-def Box Helper"
 ;  V11 tmp1         [V11,T18] (  2,  2   )     ref  ->  ecx         class-hnd "dup spill"
 ;  V12 tmp2         [V12,T20] (  3,  1.50)     ref  ->  ecx        
@@ -984,13 +986,12 @@ G_M18089_IG03:
        mov      esi, dword ptr [ebp-10H]
 
 G_M18089_IG04:
-       and      ecx, 0xD1FFAB1E
        mov      gword ptr [ebp-30H], ebx
        mov      ebx, dword ptr [ebx+4]
        mov      dword ptr [ebp-18H], ecx
        mov      eax, ecx
-       cdq      
-       idiv     edx:eax, ebx
+       xor      edx, edx
+       div      edx:eax, ebx
        mov      eax, edx
        mov      dword ptr [ebp-20H], -1
        cmp      eax, ebx
@@ -1015,7 +1016,7 @@ G_M18089_IG05:
        mov      gword ptr [ebp-34H], ebx
        lea      edx, bword ptr [ebx+edx+8]
        mov      esi, dword ptr [ebp-18H]
-       cmp      dword ptr [edx], esi
+       cmp      dword ptr [edx+4], esi
        jne      SHORT G_M18089_IG08
        mov      esi, gword ptr [edi+12]
        mov      gword ptr [ebp-3CH], esi
@@ -1046,7 +1047,7 @@ G_M18089_IG07:
 
 G_M18089_IG08:
        mov      eax, dword ptr [ebp-24H]
-       mov      edx, dword ptr [edx+4]
+       mov      edx, dword ptr [edx]
        mov      ecx, dword ptr [ebp-2CH]
        mov      esi, dword ptr [ebp-14H]
        cmp      ecx, esi
@@ -1078,7 +1079,7 @@ G_M18089_IG13:
        mov      esi, dword ptr [ebp-20H]
        test     esi, esi
        jge      SHORT G_M18089_IG14
-       mov      ecx, dword ptr [edx+4]
+       mov      ecx, dword ptr [edx]
        inc      ecx
        mov      ebx, gword ptr [ebp-30H]
        mov      esi, dword ptr [ebp-1CH]
@@ -1090,14 +1091,15 @@ G_M18089_IG14:
        cmp      esi, ecx
        jae      SHORT G_M18089_IG18
        shl      esi, 4
-       mov      ecx, dword ptr [edx+4]
+       mov      ecx, dword ptr [edx]
        mov      ebx, gword ptr [ebp-34H]
-       mov      dword ptr [ebx+esi+12], ecx
+       mov      dword ptr [ebx+esi+8], ecx
 
 G_M18089_IG15:
-       mov      dword ptr [edx], -1
        mov      ecx, dword ptr [edi+28]
-       mov      dword ptr [edx+4], ecx
+       neg      ecx
+       add      ecx, -3
+       mov      dword ptr [edx], ecx
        mov      eax, dword ptr [ebp-24H]
        mov      dword ptr [edi+28], eax
        inc      dword ptr [edi+32]
@@ -1119,7 +1121,7 @@ G_M18089_IG18:
        call     CORINFO_HELP_RNGCHKFAIL
        int3     
 
-; Total bytes of code 354, prolog size 13 for method Dictionary`2:Remove(int):bool:this
+; Total bytes of code 345, prolog size 13 for method Dictionary`2:Remove(int):bool:this
 ; ============================================================
 ; Assembly listing for method Dictionary`2:Remove(int,byref):bool:this
 ; Emitting BLENDED_CODE for generic X86 CPU - Windows
@@ -1139,7 +1141,7 @@ G_M18089_IG18:
 ;  V07 loc4         [V07,T17] (  4,  2   )     int  ->  [ebp-0x1C]  
 ;  V08 loc5         [V08,T10] (  5,  6   )     int  ->  [ebp-0x20]  
 ;  V09 loc6         [V09,T00] (  8, 21.50)     int  ->  [ebp-0x24]  
-;  V10 loc7         [V10,T01] ( 10, 18.50)   byref  ->  [ebp-0x38]  
+;  V10 loc7         [V10,T01] (  9, 18   )   byref  ->  [ebp-0x38]  
 ;* V11 tmp0         [V11    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Single-def Box Helper"
 ;  V12 tmp1         [V12,T18] (  2,  2   )     ref  ->  ecx         class-hnd "dup spill"
 ;  V13 tmp2         [V13,T20] (  3,  1.50)     ref  ->  ecx        
@@ -1191,13 +1193,12 @@ G_M18089_IG03:
        mov      esi, dword ptr [ebp-10H]
 
 G_M18089_IG04:
-       and      ecx, 0xD1FFAB1E
        mov      gword ptr [ebp-30H], ebx
        mov      ebx, dword ptr [ebx+4]
        mov      dword ptr [ebp-18H], ecx
        mov      eax, ecx
-       cdq      
-       idiv     edx:eax, ebx
+       xor      edx, edx
+       div      edx:eax, ebx
        mov      eax, edx
        mov      dword ptr [ebp-20H], -1
        cmp      eax, ebx
@@ -1222,7 +1223,7 @@ G_M18089_IG05:
        mov      gword ptr [ebp-34H], ebx
        lea      edx, bword ptr [ebx+edx+8]
        mov      esi, dword ptr [ebp-18H]
-       cmp      dword ptr [edx], esi
+       cmp      dword ptr [edx+4], esi
        jne      SHORT G_M18089_IG08
        mov      esi, gword ptr [edi+12]
        mov      gword ptr [ebp-3CH], esi
@@ -1253,7 +1254,7 @@ G_M18089_IG07:
 
 G_M18089_IG08:
        mov      eax, dword ptr [ebp-24H]
-       mov      edx, dword ptr [edx+4]
+       mov      edx, dword ptr [edx]
        mov      ecx, dword ptr [ebp-2CH]
        mov      esi, dword ptr [ebp-14H]
        cmp      ecx, esi
@@ -1287,7 +1288,7 @@ G_M18089_IG13:
        mov      esi, dword ptr [ebp-20H]
        test     esi, esi
        jge      SHORT G_M18089_IG14
-       mov      ecx, dword ptr [edx+4]
+       mov      ecx, dword ptr [edx]
        inc      ecx
        mov      ebx, gword ptr [ebp-30H]
        mov      esi, dword ptr [ebp-1CH]
@@ -1299,17 +1300,18 @@ G_M18089_IG14:
        cmp      esi, ecx
        jae      SHORT G_M18089_IG18
        shl      esi, 4
-       mov      ecx, dword ptr [edx+4]
+       mov      ecx, dword ptr [edx]
        mov      ebx, gword ptr [ebp-34H]
-       mov      dword ptr [ebx+esi+12], ecx
+       mov      dword ptr [ebx+esi+8], ecx
 
 G_M18089_IG15:
        mov      ecx, dword ptr [edx+12]
        mov      esi, bword ptr [ebp+08H]
        mov      dword ptr [esi], ecx
-       mov      dword ptr [edx], -1
        mov      ecx, dword ptr [edi+28]
-       mov      dword ptr [edx+4], ecx
+       neg      ecx
+       add      ecx, -3
+       mov      dword ptr [edx], ecx
        mov      eax, dword ptr [ebp-24H]
        mov      dword ptr [edi+28], eax
        inc      dword ptr [edi+32]
@@ -1331,7 +1333,7 @@ G_M18089_IG18:
        call     CORINFO_HELP_RNGCHKFAIL
        int3     
 
-; Total bytes of code 371, prolog size 13 for method Dictionary`2:Remove(int,byref):bool:this
+; Total bytes of code 362, prolog size 13 for method Dictionary`2:Remove(int,byref):bool:this
 ; ============================================================
 ; Assembly listing for method Dictionary`2:TryInsert(int,int,ubyte):bool:this
 ; Emitting BLENDED_CODE for generic X86 CPU - Windows
@@ -1341,49 +1343,50 @@ G_M18089_IG18:
 ; fully interruptible
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T04] ( 22, 14   )     ref  ->  esi         this class-hnd
-;  V01 arg1         [V01,T09] (  9,  7.50)     int  ->  [ebp-0x10]   ld-addr-op
-;  V02 arg2         [V02,T28] (  3,  1.50)     int  ->  [ebp+0x0C]  
-;  V03 arg3         [V03,T33] (  2,  1   )   ubyte  ->  [ebp+0x08]  
-;  V04 loc0         [V04,T00] ( 12, 30.50)     ref  ->  [ebp-0x34]   class-hnd
-;  V05 loc1         [V05,T15] (  5,  5.50)     ref  ->  [ebp-0x38]   class-hnd
-;  V06 loc2         [V06,T10] (  6, 11   )     int  ->  [ebp-0x14]  
+;  V00 this         [V00,T04] ( 23, 14.50)     ref  ->  esi         this class-hnd
+;  V01 arg1         [V01,T07] (  9,  7.50)     int  ->  [ebp-0x10]   ld-addr-op
+;  V02 arg2         [V02,T29] (  3,  1.50)     int  ->  [ebp+0x0C]  
+;  V03 arg3         [V03,T34] (  2,  1   )   ubyte  ->  [ebp+0x08]  
+;  V04 loc0         [V04,T00] ( 12, 27.50)     ref  ->  [ebp-0x2C]   class-hnd
+;  V05 loc1         [V05,T15] (  5,  5.50)     ref  ->  [ebp-0x30]   class-hnd
+;  V06 loc2         [V06,T08] (  6, 11   )     int  ->  [ebp-0x14]  
 ;  V07 loc3         [V07,T01] (  7, 25   )     int  ->  [ebp-0x18]  
-;  V08 loc4         [V08,T18] (  5,  3.50)   byref  ->  [ebp-0x3C]  
-;  V09 loc5         [V09,T02] (  7, 25   )     int  ->  eax        
-;  V10 loc6         [V10,T29] (  3,  1.50)    bool  ->  [ebp-0x1C]  
-;  V11 loc7         [V11,T20] (  6,  3   )     int  ->  registers  
-;  V12 loc8         [V12,T19] (  6,  3   )   byref  ->  eax        
+;  V08 loc4         [V08,T18] (  5,  3.50)   byref  ->  [ebp-0x34]  
+;  V09 loc5         [V09,T02] (  7, 25   )     int  ->  registers  
+;  V10 loc6         [V10,T30] (  3,  1.50)    bool  ->  ebx        
+;  V11 loc7         [V11,T19] (  6,  3   )     int  ->  registers  
+;  V12 loc8         [V12,T25] (  5,  2.50)   byref  ->  eax        
 ;* V13 loc9         [V13    ] (  0,  0   )     int  ->  zero-ref    ld-addr-op
 ;* V14 loc10        [V14    ] (  0,  0   )     ref  ->  zero-ref    class-hnd
-;  V15 loc11        [V15,T30] (  3,  1.50)     int  ->  [ebp-0x20]  
+;  V15 loc11        [V15,T31] (  3,  1.50)     int  ->  [ebp-0x1C]  
 ;* V16 tmp0         [V16    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Single-def Box Helper"
-;  V17 tmp1         [V17,T25] (  3,  2   )     int  ->  registers  
+;  V17 tmp1         [V17,T26] (  3,  2   )     int  ->  ecx        
 ;* V18 tmp2         [V18    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Single-def Box Helper"
 ;* V19 tmp3         [V19    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Single-def Box Helper"
-;  V20 tmp4         [V20,T16] (  2,  4   )    bool  ->  ecx         "Inline return value spill temp"
-;  V21 tmp5         [V21,T11] (  2,  8   )     int  ->  ecx         ld-addr-op "Inlining Arg"
+;  V20 tmp4         [V20,T16] (  2,  4   )    bool  ->  eax         "Inline return value spill temp"
+;  V21 tmp5         [V21,T11] (  2,  8   )     int  ->  eax         ld-addr-op "Inlining Arg"
 ;* V22 tmp6         [V22    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Single-def Box Helper"
 ;* V23 tmp7         [V23,T17] (  0,  0   )     int  ->  zero-ref    "Inlining Arg"
 ;* V24 tmp8         [V24    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Single-def Box Helper"
-;  V25 tmp9         [V25,T13] (  3,  6   )     ref  ->  [ebp-0x40]   "arr expr"
+;  V25 tmp9         [V25,T13] (  3,  6   )     ref  ->  [ebp-0x38]   "arr expr"
 ;  V26 tmp10        [V26,T14] (  3,  6   )     int  ->  edx         "arr expr"
-;  V27 tmp11        [V27,T12] (  2,  8   )     int  ->  [ebp-0x24]   "argument with side effect"
-;  V28 tmp12        [V28,T26] (  2,  2   )     int  ->  edx         "argument with side effect"
-;  V29 tmp13        [V29,T22] (  3,  3   )     ref  ->  ebx         "arr expr"
-;  V30 tmp14        [V30,T24] (  3,  3   )     int  ->  edx         "arr expr"
-;  V31 cse0         [V31,T07] (  5, 12.50)   byref  ->  registers   "ValNumCSE"
-;  V32 cse1         [V32,T08] (  5, 12.50)   byref  ->  registers   "ValNumCSE"
-;  V33 cse2         [V33,T23] (  3,  3   )     ref  ->  [ebp-0x44]   "ValNumCSE"
-;  V34 cse3         [V34,T21] (  6,  3   )     int  ->  ecx         "ValNumCSE"
-;  V35 cse4         [V35,T31] (  3,  1.50)     int  ->  [ebp-0x28]   "ValNumCSE"
-;  V36 cse5         [V36,T32] (  3,  1.50)     int  ->  eax         "ValNumCSE"
-;  V37 cse6         [V37,T27] (  3,  1.50)     ref  ->  ecx         "ValNumCSE"
-;  V38 cse7         [V38,T03] (  7, 24.50)     int  ->  [ebp-0x2C]   "ValNumCSE"
-;  V39 cse8         [V39,T05] (  4, 14   )     int  ->  eax         "ValNumCSE"
-;  V40 cse9         [V40,T06] (  4, 14   )     int  ->  [ebp-0x30]   "ValNumCSE"
+;  V27 tmp11        [V27,T12] (  2,  8   )     int  ->  [ebp-0x20]   "argument with side effect"
+;  V28 tmp12        [V28,T27] (  2,  2   )     int  ->  edx         "argument with side effect"
+;  V29 tmp13        [V29,T21] (  3,  3   )     ref  ->  edi         "arr expr"
+;  V30 tmp14        [V30,T23] (  3,  3   )     int  ->  edx         "arr expr"
+;  V31 tmp15        [V31,T24] (  3,  3   )     int  ->  ebx         "arr expr"
+;  V32 cse0         [V32,T09] (  4, 10.50)   byref  ->  ecx         "ValNumCSE"
+;  V33 cse1         [V33,T10] (  4, 10.50)   byref  ->  [ebp-0x3C]   "ValNumCSE"
+;  V34 cse2         [V34,T22] (  3,  3   )     ref  ->  [ebp-0x40]   "ValNumCSE"
+;  V35 cse3         [V35,T20] (  6,  3   )     int  ->  edx         "ValNumCSE"
+;  V36 cse4         [V36,T32] (  3,  1.50)     int  ->  eax         "ValNumCSE"
+;  V37 cse5         [V37,T33] (  3,  1.50)     int  ->  eax         "ValNumCSE"
+;  V38 cse6         [V38,T28] (  3,  1.50)     ref  ->  ecx         "ValNumCSE"
+;  V39 cse7         [V39,T03] (  7, 24.50)     int  ->  registers   "ValNumCSE"
+;  V40 cse8         [V40,T05] (  3, 12   )     int  ->  [ebp-0x24]   "ValNumCSE"
+;  V41 cse9         [V41,T06] (  3, 12   )     int  ->  [ebp-0x28]   "ValNumCSE"
 ;
-; Lcl frame size = 56
+; Lcl frame size = 52
 
 G_M59125_IG01:
        push     ebp
@@ -1391,7 +1394,7 @@ G_M59125_IG01:
        push     edi
        push     esi
        push     ebx
-       sub      esp, 56
+       sub      esp, 52
        mov      esi, ecx
        mov      edi, edx
 
@@ -1404,96 +1407,89 @@ G_M59125_IG02:
 
 G_M59125_IG03:
        mov      eax, gword ptr [esi+8]
-       mov      gword ptr [ebp-34H], eax
+       mov      gword ptr [ebp-2CH], eax
        mov      edx, gword ptr [esi+12]
        test     edx, edx
        je       SHORT G_M59125_IG04
-       mov      gword ptr [ebp-38H], edx
+       mov      gword ptr [ebp-30H], edx
        mov      ecx, edx
        mov      edx, edi
        call     [IEqualityComparer`1:GetHashCode(int):int:this]
+       mov      ecx, eax
+       mov      dword ptr [ebp-10H], edi
        jmp      SHORT G_M59125_IG05
 
 G_M59125_IG04:
        mov      dword ptr [ebp-10H], edi
        mov      ecx, edi
-       mov      gword ptr [ebp-38H], edx
-       mov      eax, ecx
-       mov      edi, dword ptr [ebp-10H]
+       mov      gword ptr [ebp-30H], edx
 
 G_M59125_IG05:
-       mov      ecx, eax
-       and      ecx, 0xD1FFAB1E
-       xor      eax, eax
-       mov      dword ptr [ebp-18H], eax
-       mov      ebx, gword ptr [esi+4]
-       mov      gword ptr [ebp-44H], ebx
-       mov      gword ptr [ebp-40H], ebx
-       mov      ebx, gword ptr [ebp-44H]
+       xor      ebx, ebx
+       mov      edx, gword ptr [esi+4]
+       mov      gword ptr [ebp-40H], edx
+       mov      gword ptr [ebp-38H], edx
+       mov      dword ptr [ebp-14H], ecx
+       mov      edi, gword ptr [ebp-40H]
        mov      eax, ecx
-       cdq      
-       idiv     edx:eax, dword ptr [ebx+4]
-       mov      ebx, gword ptr [ebp-40H]
-       cmp      edx, dword ptr [ebx+4]
+       xor      edx, edx
+       div      edx:eax, dword ptr [edi+4]
+       mov      edi, gword ptr [ebp-38H]
+       cmp      edx, dword ptr [edi+4]
        jae      G_M59125_IG30
-       lea      ebx, bword ptr [ebx+4*edx+8]
-       mov      bword ptr [ebp-3CH], ebx
-       mov      eax, dword ptr [ebx]
+       lea      edi, bword ptr [edi+4*edx+8]
+       mov      bword ptr [ebp-34H], edi
+       mov      eax, dword ptr [edi]
        dec      eax
-       cmp      gword ptr [ebp-38H], 0
+       cmp      gword ptr [ebp-30H], 0
        jne      SHORT G_M59125_IG09
 
 G_M59125_IG06:
-       mov      edx, gword ptr [ebp-34H]
-       mov      ebx, dword ptr [edx+4]
-       cmp      ebx, eax
+       mov      edx, gword ptr [ebp-2CH]
+       mov      edi, dword ptr [edx+4]
+       cmp      edi, eax
        jbe      G_M59125_IG19
        shl      eax, 4
-       mov      dword ptr [ebp-14H], ecx
-       cmp      dword ptr [edx+eax+8], ecx
-       mov      dword ptr [ebp-10H], edi
+       mov      dword ptr [ebp-24H], eax
+       lea      ecx, bword ptr [edx+eax+8]
+       mov      eax, dword ptr [ebp-14H]
+       cmp      dword ptr [ecx+4], eax
        jne      SHORT G_M59125_IG07
-       lea      edi, bword ptr [edx+eax+8]
-       mov      ecx, dword ptr [edi+8]
-       cmp      ecx, dword ptr [ebp-10H]
-       sete     cl
-       movzx    ecx, cl
-       test     ecx, ecx
+       mov      eax, dword ptr [ecx+8]
+       cmp      eax, dword ptr [ebp-10H]
+       sete     al
+       movzx    eax, al
+       test     eax, eax
        jne      SHORT G_M59125_IG10
 
 G_M59125_IG07:
-       lea      eax, bword ptr [edx+eax+8]
-       mov      eax, dword ptr [eax+4]
-       mov      edi, dword ptr [ebp-18H]
-       cmp      ebx, edi
+       mov      ecx, dword ptr [ebp-24H]
+       mov      ecx, dword ptr [edx+ecx+8]
+       cmp      edi, ebx
        jle      G_M59125_IG27
 
 G_M59125_IG08:
-       inc      edi
-       mov      gword ptr [ebp-34H], edx
-       mov      dword ptr [ebp-18H], edi
-       mov      ecx, dword ptr [ebp-14H]
-       mov      edi, dword ptr [ebp-10H]
+       inc      ebx
+       mov      gword ptr [ebp-2CH], edx
+       mov      eax, ecx
        jmp      SHORT G_M59125_IG06
 
 G_M59125_IG09:
-       mov      dword ptr [ebp-14H], ecx
+       mov      dword ptr [ebp-18H], ebx
        jmp      SHORT G_M59125_IG14
 
 G_M59125_IG10:
-       mov      eax, edi
-       mov      edi, dword ptr [ebp-10H]
        mov      ebx, dword ptr [ebp+08H]
-       movzx    ecx, bl
-       cmp      ecx, 1
+       movzx    edx, bl
+       cmp      edx, 1
        jne      SHORT G_M59125_IG11
        mov      ebx, dword ptr [ebp+0CH]
-       mov      dword ptr [eax+12], ebx
+       mov      dword ptr [ecx+12], ebx
        inc      dword ptr [esi+36]
        jmp      G_M59125_IG24
 
 G_M59125_IG11:
-       cmp      ecx, 2
+       cmp      edx, 2
        je       G_M59125_IG26
 
 G_M59125_IG12:
@@ -1508,73 +1504,66 @@ G_M59125_IG13:
        ret      8
 
 G_M59125_IG14:
-       mov      ebx, gword ptr [ebp-34H]
-       mov      ecx, dword ptr [ebx+4]
-       mov      dword ptr [ebp-2CH], ecx
-       cmp      ecx, eax
+       mov      edi, gword ptr [ebp-2CH]
+       mov      ebx, dword ptr [edi+4]
+       cmp      ebx, eax
        jbe      SHORT G_M59125_IG15
        shl      eax, 4
+       mov      dword ptr [ebp-28H], eax
+       lea      ecx, bword ptr [edi+eax+8]
        mov      edx, dword ptr [ebp-14H]
-       cmp      dword ptr [ebx+eax+8], edx
-       mov      dword ptr [ebp-10H], edi
+       cmp      dword ptr [ecx+4], edx
        jne      SHORT G_M59125_IG17
-       mov      dword ptr [ebp-30H], eax
-       lea      edi, bword ptr [ebx+eax+8]
-       mov      edx, dword ptr [edi+8]
-       mov      dword ptr [ebp-24H], edx
+       mov      bword ptr [ebp-3CH], ecx
+       mov      edx, dword ptr [ecx+8]
+       mov      dword ptr [ebp-20H], edx
        mov      edx, dword ptr [ebp-10H]
        push     edx
-       mov      edx, dword ptr [ebp-24H]
-       mov      ecx, gword ptr [ebp-38H]
+       mov      edx, dword ptr [ebp-20H]
+       mov      ecx, gword ptr [ebp-30H]
        call     [IEqualityComparer`1:Equals(int,int):bool:this]
        test     eax, eax
-       mov      eax, dword ptr [ebp-30H]
        je       SHORT G_M59125_IG17
        mov      ebx, dword ptr [ebp+08H]
-       movzx    ecx, bl
-       cmp      ecx, 1
+       movzx    edx, bl
+       cmp      edx, 1
        jne      SHORT G_M59125_IG16
+       mov      edi, bword ptr [ebp-3CH]
        mov      ebx, dword ptr [ebp+0CH]
        mov      dword ptr [edi+12], ebx
        inc      dword ptr [esi+36]
        jmp      G_M59125_IG24
 
 G_M59125_IG15:
-       mov      edx, ebx
-       mov      ecx, dword ptr [ebp-14H]
-       mov      ebx, dword ptr [ebp-2CH]
+       mov      edx, edi
+       mov      edi, ebx
        jmp      SHORT G_M59125_IG19
 
 G_M59125_IG16:
-       cmp      ecx, 2
+       cmp      edx, 2
        je       G_M59125_IG28
        jmp      SHORT G_M59125_IG12
 
 G_M59125_IG17:
-       lea      eax, bword ptr [ebx+eax+8]
-       mov      eax, dword ptr [eax+4]
-       mov      ecx, dword ptr [ebp-2CH]
-       mov      edi, dword ptr [ebp-18H]
-       cmp      ecx, edi
+       mov      eax, dword ptr [ebp-28H]
+       mov      eax, dword ptr [edi+eax+8]
+       mov      ecx, dword ptr [ebp-18H]
+       cmp      ebx, ecx
        jle      G_M59125_IG29
 
 G_M59125_IG18:
-       inc      edi
-       mov      gword ptr [ebp-34H], ebx
-       mov      dword ptr [ebp-18H], edi
-       mov      edi, dword ptr [ebp-10H]
+       inc      ecx
+       mov      gword ptr [ebp-2CH], edi
+       mov      dword ptr [ebp-18H], ecx
        jmp      G_M59125_IG14
 
 G_M59125_IG19:
-       xor      eax, eax
-       mov      dword ptr [ebp-1CH], eax
+       xor      ebx, ebx
        mov      eax, dword ptr [esi+32]
-       mov      dword ptr [ebp-28H], eax
        test     eax, eax
        jle      SHORT G_M59125_IG20
-       mov      ebx, dword ptr [esi+28]
-       mov      dword ptr [ebp-1CH], 1
-       mov      eax, dword ptr [ebp-28H]
+       mov      edi, dword ptr [esi+28]
+       mov      ebx, 1
        dec      eax
        mov      dword ptr [esi+32], eax
        jmp      SHORT G_M59125_IG22
@@ -1582,10 +1571,9 @@ G_M59125_IG19:
 G_M59125_IG20:
        mov      eax, dword ptr [esi+24]
        mov      edx, eax
-       mov      dword ptr [ebp-20H], edx
-       cmp      ebx, edx
+       mov      dword ptr [ebp-1CH], edx
+       cmp      edi, edx
        jne      SHORT G_M59125_IG21
-       mov      dword ptr [ebp-14H], ecx
        mov      ecx, eax
        call     HashHelpers:ExpandPrime(int):int
        mov      edx, eax
@@ -1593,46 +1581,53 @@ G_M59125_IG20:
        mov      ecx, esi
        call     Dictionary`2:Resize(int,bool):this
        mov      ecx, gword ptr [esi+4]
-       mov      ebx, ecx
+       mov      edi, ecx
        mov      eax, dword ptr [ebp-14H]
-       cdq      
-       idiv     edx:eax, dword ptr [ecx+4]
-       cmp      edx, dword ptr [ebx+4]
+       xor      edx, edx
+       div      edx:eax, dword ptr [ecx+4]
+       cmp      edx, dword ptr [edi+4]
        jae      G_M59125_IG30
-       lea      ebx, bword ptr [ebx+4*edx+8]
-       mov      bword ptr [ebp-3CH], ebx
-       mov      ecx, dword ptr [ebp-14H]
+       lea      edi, bword ptr [edi+4*edx+8]
+       mov      bword ptr [ebp-34H], edi
 
 G_M59125_IG21:
-       mov      edx, dword ptr [ebp-20H]
+       mov      edx, dword ptr [ebp-1CH]
        lea      eax, [edx+1]
        mov      dword ptr [esi+24], eax
        mov      eax, gword ptr [esi+8]
-       mov      ebx, edx
+       mov      edi, edx
        mov      edx, eax
 
 G_M59125_IG22:
-       cmp      ebx, dword ptr [edx+4]
+       cmp      edi, dword ptr [edx+4]
        jae      SHORT G_M59125_IG30
-       mov      eax, ebx
+       mov      eax, edi
        shl      eax, 4
        lea      eax, bword ptr [edx+eax+8]
-       cmp      dword ptr [ebp-1CH], 0
+       test     ebx, ebx
        je       SHORT G_M59125_IG23
-       mov      edx, dword ptr [eax+4]
+       mov      ebx, dword ptr [esi+28]
+       cmp      ebx, dword ptr [edx+4]
+       jae      SHORT G_M59125_IG30
+       shl      ebx, 4
+       mov      edx, dword ptr [edx+ebx+8]
+       neg      edx
+       add      edx, -3
        mov      dword ptr [esi+28], edx
 
 G_M59125_IG23:
+       mov      ecx, dword ptr [ebp-14H]
+       mov      dword ptr [eax+4], ecx
+       mov      ebx, bword ptr [ebp-34H]
+       mov      ecx, dword ptr [ebx]
+       dec      ecx
        mov      dword ptr [eax], ecx
-       mov      ecx, bword ptr [ebp-3CH]
-       mov      edx, dword ptr [ecx]
-       dec      edx
-       mov      dword ptr [eax+4], edx
-       mov      dword ptr [eax+8], edi
-       mov      edi, dword ptr [ebp+0CH]
-       mov      dword ptr [eax+12], edi
-       inc      ebx
-       mov      dword ptr [ecx], ebx
+       mov      edx, dword ptr [ebp-10H]
+       mov      dword ptr [eax+8], edx
+       mov      edx, dword ptr [ebp+0CH]
+       mov      dword ptr [eax+12], edx
+       inc      edi
+       mov      dword ptr [ebx], edi
        inc      dword ptr [esi+36]
 
 G_M59125_IG24:
@@ -1647,7 +1642,7 @@ G_M59125_IG25:
        ret      8
 
 G_M59125_IG26:
-       mov      ecx, edi
+       mov      ecx, dword ptr [ebp-10H]
        call     ThrowHelper:ThrowAddingDuplicateWithKeyArgumentException(int)
        int3     
 
@@ -1656,8 +1651,7 @@ G_M59125_IG27:
        int3     
 
 G_M59125_IG28:
-       mov      edi, dword ptr [ebp-10H]
-       mov      ecx, edi
+       mov      ecx, dword ptr [ebp-10H]
        call     ThrowHelper:ThrowAddingDuplicateWithKeyArgumentException(int)
        int3     
 
@@ -1669,7 +1663,7 @@ G_M59125_IG30:
        call     CORINFO_HELP_RNGCHKFAIL
        int3     
 
-; Total bytes of code 630, prolog size 13 for method Dictionary`2:TryInsert(int,int,ubyte):bool:this
+; Total bytes of code 597, prolog size 13 for method Dictionary`2:TryInsert(int,int,ubyte):bool:this
 ; ============================================================
 ; Assembly listing for method Dictionary`2:Resize(int,bool):this
 ; Emitting BLENDED_CODE for generic X86 CPU - Windows
@@ -1679,24 +1673,23 @@ G_M59125_IG30:
 ; fully interruptible
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T04] (  6,  6   )     ref  ->  [ebp-0x18]   this class-hnd
-;  V01 arg1         [V01,T06] (  5,  6   )     int  ->  [ebp-0x10]  
+;  V00 this         [V00,T03] (  6,  6   )     ref  ->  [ebp-0x18]   this class-hnd
+;  V01 arg1         [V01,T05] (  5,  6   )     int  ->  [ebp-0x10]  
 ;* V02 arg2         [V02    ] (  0,  0   )    bool  ->  zero-ref   
-;  V03 loc0         [V03,T05] (  5,  8   )     ref  ->  ebx         class-hnd
-;  V04 loc1         [V04,T01] (  6, 13   )     ref  ->  [ebp-0x1C]   class-hnd
-;  V05 loc2         [V05,T09] (  4,  7   )     int  ->  [ebp-0x14]  
+;  V03 loc0         [V03,T04] (  5,  8   )     ref  ->  ebx         class-hnd
+;  V04 loc1         [V04,T01] (  7, 15   )     ref  ->  [ebp-0x1C]   class-hnd
+;  V05 loc2         [V05,T08] (  4,  7   )     int  ->  [ebp-0x14]  
 ;* V06 loc3         [V06    ] (  0,  0   )     int  ->  zero-ref    ld-addr-op
 ;* V07 loc4         [V07    ] (  0,  0   )     int  ->  zero-ref   
 ;  V08 loc5         [V08,T00] (  7, 23   )     int  ->  ecx        
-;  V09 loc6         [V09,T07] (  4,  8   )     int  ->  edx        
+;  V09 loc6         [V09,T06] (  4,  8   )     int  ->  edx        
 ;* V10 tmp0         [V10    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Single-def Box Helper"
 ;* V11 tmp1         [V11    ] (  0,  0   )   byref  ->  zero-ref    "impAppendStmt"
-;  V12 tmp2         [V12,T10] (  2,  4   )     ref  ->  ecx         class-hnd "Inlining Arg"
+;  V12 tmp2         [V12,T09] (  2,  4   )     ref  ->  ecx         class-hnd "Inlining Arg"
 ;* V13 tmp3         [V13    ] (  0,  0   )   byref  ->  zero-ref    "Inlining Arg"
-;  V14 cse0         [V14,T02] (  3, 10   )     int  ->  edx         "ValNumCSE"
-;  V15 cse1         [V15,T08] (  2,  8   )     int  ->  esi         "ValNumCSE"
-;  V16 cse2         [V16,T03] (  3, 10   )     int  ->  esi         "ValNumCSE"
-;  V17 cse3         [V17,T11] (  2,  4   )     int  ->  eax         "ValNumCSE"
+;  V14 cse0         [V14,T07] (  2,  8   )     int  ->  esi         "ValNumCSE"
+;  V15 cse1         [V15,T02] (  4, 12   )     int  ->  esi         "ValNumCSE"
+;  V16 cse2         [V16,T10] (  2,  4   )     int  ->  eax         "ValNumCSE"
 ;
 ; Lcl frame size = 16
 
@@ -1731,48 +1724,46 @@ G_M14072_IG02:
        call     Array:Copy(ref,int,ref,int,int,bool)
        xor      ecx, ecx
        cmp      dword ptr [ebp-14H], 0
-       jle      SHORT G_M14072_IG05
+       jle      SHORT G_M14072_IG08
 
 G_M14072_IG03:
        mov      eax, gword ptr [ebp-1CH]
        mov      esi, dword ptr [eax+4]
        cmp      ecx, esi
-       jae      SHORT G_M14072_IG07
+       jae      SHORT G_M14072_IG09
        mov      esi, ecx
        shl      esi, 4
-       mov      gword ptr [ebp-1CH], eax
-       mov      edx, dword ptr [eax+esi+8]
-       test     edx, edx
+       cmp      dword ptr [eax+esi+8], -1
        jl       SHORT G_M14072_IG04
+       mov      gword ptr [ebp-1CH], eax
+       mov      eax, dword ptr [eax+esi+12]
        mov      dword ptr [ebp-10H], edi
-       mov      eax, edx
-       cdq      
-       idiv     edx:eax, edi
+       xor      edx, edx
+       div      edx:eax, edi
        mov      eax, dword ptr [ebx+4]
        cmp      edx, eax
-       jae      SHORT G_M14072_IG07
+       jae      SHORT G_M14072_IG09
        mov      eax, dword ptr [ebx+4*edx+8]
        dec      eax
        mov      edi, gword ptr [ebp-1CH]
-       mov      dword ptr [edi+esi+12], eax
+       mov      dword ptr [edi+esi+8], eax
        lea      eax, [ecx+1]
        mov      dword ptr [ebx+4*edx+8], eax
-       mov      gword ptr [ebp-1CH], edi
+       mov      eax, edi
        mov      edi, dword ptr [ebp-10H]
 
 G_M14072_IG04:
        inc      ecx
-       mov      esi, dword ptr [ebp-14H]
-       cmp      ecx, esi
-       mov      dword ptr [ebp-14H], esi
-       jl       SHORT G_M14072_IG03
+       mov      edx, dword ptr [ebp-14H]
+       cmp      ecx, edx
+       mov      dword ptr [ebp-14H], edx
+       jl       SHORT G_M14072_IG07
 
 G_M14072_IG05:
        mov      esi, gword ptr [ebp-18H]
        lea      edx, bword ptr [esi+4]
        call     CORINFO_HELP_ASSIGN_REF_EBX
        lea      edx, bword ptr [esi+8]
-       mov      eax, gword ptr [ebp-1CH]
        call     CORINFO_HELP_ASSIGN_REF_EAX
 
 G_M14072_IG06:
@@ -1784,10 +1775,18 @@ G_M14072_IG06:
        ret      4
 
 G_M14072_IG07:
+       mov      gword ptr [ebp-1CH], eax
+       jmp      SHORT G_M14072_IG03
+
+G_M14072_IG08:
+       mov      eax, gword ptr [ebp-1CH]
+       jmp      SHORT G_M14072_IG05
+
+G_M14072_IG09:
        call     CORINFO_HELP_RNGCHKFAIL
        int3     
 
-; Total bytes of code 190, prolog size 13 for method Dictionary`2:Resize(int,bool):this
+; Total bytes of code 198, prolog size 13 for method Dictionary`2:Resize(int,bool):this
 ; ============================================================
 ; Assembly listing for method Dictionary`2:TrimExcess(int):this
 ; Emitting BLENDED_CODE for generic X86 CPU - Windows
@@ -1797,22 +1796,25 @@ G_M14072_IG07:
 ; fully interruptible
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T02] ( 13,  9   )     ref  ->  [ebp-0x1C]   this class-hnd
-;  V01 arg1         [V01,T07] (  4,  4   )     int  ->  edx        
-;  V02 loc0         [V02,T11] (  4,  4.50)     int  ->  [ebp-0x10]  
-;  V03 loc1         [V03,T01] (  6, 12.50)     ref  ->  ebx         class-hnd
+;  V00 this         [V00,T03] ( 13,  9   )     ref  ->  [ebp-0x20]   this class-hnd
+;  V01 arg1         [V01,T09] (  4,  4   )     int  ->  edx        
+;  V02 loc0         [V02,T14] (  4,  4.50)     int  ->  [ebp-0x10]  
+;  V03 loc1         [V03,T01] (  6, 14.50)     ref  ->  ebx         class-hnd
 ;* V04 loc2         [V04    ] (  0,  0   )     int  ->  zero-ref   
-;  V05 loc3         [V05,T10] (  3,  5   )     int  ->  [ebp-0x14]  
-;  V06 loc4         [V06,T12] (  3,  4.50)     ref  ->  [ebp-0x20]   class-hnd
-;  V07 loc5         [V07,T06] (  4,  6.50)     ref  ->  [ebp-0x24]   class-hnd
-;  V08 loc6         [V08,T04] (  6,  9   )     int  ->  [ebp-0x18]  
-;  V09 loc7         [V09,T00] (  7, 22.50)     int  ->  edi        
-;  V10 loc8         [V10,T03] (  3, 10   )     int  ->  ecx        
-;  V11 loc9         [V11,T08] (  3,  6   )   byref  ->  [ebp-0x28]  
-;  V12 loc10        [V12,T05] (  4,  8   )     int  ->  edx        
-;  V13 tmp0         [V13,T14] (  3,  2   )     int  ->  ecx        
-;  V14 cse0         [V14,T13] (  2,  4   )     int  ->  eax         "ValNumCSE"
-;  V15 cse1         [V15,T09] (  3,  6   )     int  ->  esi         "ValNumCSE"
+;  V05 loc3         [V05,T13] (  3,  5   )     int  ->  [ebp-0x14]  
+;  V06 loc4         [V06,T15] (  3,  4.50)     ref  ->  [ebp-0x24]   class-hnd
+;  V07 loc5         [V07,T08] (  4,  6.50)     ref  ->  [ebp-0x28]   class-hnd
+;  V08 loc6         [V08,T05] (  6,  9   )     int  ->  [ebp-0x18]  
+;  V09 loc7         [V09,T00] (  6, 20.50)     int  ->  edi        
+;  V10 loc8         [V10,T12] (  2,  6   )     int  ->  [ebp-0x1C]  
+;  V11 loc9         [V11,T10] (  3,  6   )   byref  ->  ecx        
+;  V12 loc10        [V12,T07] (  4,  8   )     int  ->  edx        
+;  V13 tmp0         [V13,T17] (  3,  2   )     int  ->  ecx        
+;  V14 cse0         [V14,T04] (  3, 10   )   byref  ->  edx         "ValNumCSE"
+;  V15 cse1         [V15,T02] (  3, 12   )     int  ->  ecx         "ValNumCSE"
+;  V16 cse2         [V16,T06] (  4,  9   )     int  ->  ecx         "ValNumCSE"
+;  V17 cse3         [V17,T16] (  2,  4   )     int  ->  esi         "ValNumCSE"
+;  V18 cse4         [V18,T11] (  3,  6   )     int  ->  ecx         "ValNumCSE"
 ;
 ; Lcl frame size = 28
 
@@ -1865,63 +1867,61 @@ G_M47871_IG07:
        mov      edx, edi
        call     Dictionary`2:Initialize(int):int:this
        mov      ecx, gword ptr [esi+8]
-       mov      gword ptr [ebp-20H], ecx
-       mov      gword ptr [ebp-1CH], esi
+       mov      gword ptr [ebp-24H], ecx
+       mov      gword ptr [ebp-20H], esi
        mov      edx, gword ptr [esi+4]
-       mov      gword ptr [ebp-24H], edx
+       mov      gword ptr [ebp-28H], edx
        xor      eax, eax
        xor      edi, edi
        cmp      dword ptr [ebp-14H], 0
        jle      SHORT G_M47871_IG10
 
 G_M47871_IG08:
-       cmp      edi, dword ptr [ebx+4]
+       mov      ecx, dword ptr [ebx+4]
+       cmp      edi, ecx
        jae      G_M47871_IG13
        mov      ecx, edi
        shl      ecx, 4
-       mov      ecx, dword ptr [ebx+ecx+8]
-       test     ecx, ecx
+       lea      edx, bword ptr [ebx+ecx+8]
+       mov      esi, dword ptr [edx+4]
+       mov      dword ptr [ebp-1CH], esi
+       cmp      dword ptr [ebx+ecx+8], -1
        jl       SHORT G_M47871_IG09
-       mov      edx, gword ptr [ebp-20H]
-       cmp      eax, dword ptr [edx+4]
+       mov      ecx, gword ptr [ebp-24H]
+       cmp      eax, dword ptr [ecx+4]
        jae      SHORT G_M47871_IG13
        mov      dword ptr [ebp-18H], eax
-       mov      edx, eax
-       shl      edx, 4
-       mov      esi, gword ptr [ebp-20H]
-       lea      edx, bword ptr [esi+edx+8]
-       mov      esi, edi
-       shl      esi, 4
-       lea      esi, bword ptr [ebx+esi+8]
-       mov      bword ptr [ebp-28H], edx
-       movdqu   xmm0, qword ptr [esi]
-       movdqu   qword ptr [edx], xmm0
-       mov      eax, ecx
-       cdq      
-       idiv     edx:eax, dword ptr [ebp-10H]
-       mov      ecx, gword ptr [ebp-24H]
-       mov      eax, dword ptr [ecx+4]
-       cmp      edx, eax
+       mov      ecx, eax
+       shl      ecx, 4
+       mov      esi, gword ptr [ebp-24H]
+       lea      ecx, bword ptr [esi+ecx+8]
+       movdqu   xmm0, qword ptr [edx]
+       movdqu   qword ptr [ecx], xmm0
+       mov      eax, dword ptr [ebp-1CH]
+       xor      edx, edx
+       div      edx:eax, dword ptr [ebp-10H]
+       mov      eax, gword ptr [ebp-28H]
+       mov      esi, dword ptr [eax+4]
+       cmp      edx, esi
        jae      SHORT G_M47871_IG13
-       mov      eax, dword ptr [ecx+4*edx+8]
-       dec      eax
-       mov      esi, bword ptr [ebp-28H]
-       mov      dword ptr [esi+4], eax
-       mov      esi, dword ptr [ebp-18H]
-       inc      esi
-       mov      dword ptr [ecx+4*edx+8], esi
-       mov      gword ptr [ebp-24H], ecx
-       mov      eax, esi
+       mov      esi, dword ptr [eax+4*edx+8]
+       dec      esi
+       mov      dword ptr [ecx], esi
+       mov      ecx, dword ptr [ebp-18H]
+       inc      ecx
+       mov      dword ptr [eax+4*edx+8], ecx
+       mov      gword ptr [ebp-28H], eax
+       mov      eax, ecx
 
 G_M47871_IG09:
        inc      edi
-       mov      ecx, dword ptr [ebp-14H]
-       cmp      edi, ecx
-       mov      dword ptr [ebp-14H], ecx
+       mov      esi, dword ptr [ebp-14H]
+       cmp      edi, esi
+       mov      dword ptr [ebp-14H], esi
        jl       SHORT G_M47871_IG08
 
 G_M47871_IG10:
-       mov      esi, gword ptr [ebp-1CH]
+       mov      esi, gword ptr [ebp-20H]
        mov      dword ptr [esi+24], eax
        xor      ecx, ecx
        mov      dword ptr [esi+32], ecx
@@ -1943,5 +1943,5 @@ G_M47871_IG13:
        call     CORINFO_HELP_RNGCHKFAIL
        int3     
 
-; Total bytes of code 258, prolog size 11 for method Dictionary`2:TrimExcess(int):this
+; Total bytes of code 255, prolog size 11 for method Dictionary`2:TrimExcess(int):this
 ; ============================================================

@MarcoRossignoli
Copy link
Member Author

@jkotas PTAL
I fixed and improved assertions added TryInsert(on same message) diff and added x86 diff
I get some build error on tests(I don't see assertion fails) tried with /azp run but no luck.

@danmoseley
Copy link
Member

CSC : error CS0006: Metadata file 'F:\workspace\_work\1\s\bin\Product\Windows_NT.x86.Checked\System.Private.CoreLib.dll' could not be found is clearly not you...

/azp run

@danmoseley
Copy link
Member

@RussKeldorph are the jenkins based legs like Windows_NT x64 Release CoreFX Tests obsolete? They seem broken

12:53:39 BUILD: Restoring the OptimizationData Package
12:53:39 '\dotnet.cmd' is not recognized as an internal or external command

@RussKeldorph
Copy link

@danmosemsft They are not (yet) obsolete. Something seems to have gone terribly wrong with the tools restore. I'm assuming due to infra stuff yesterday. Later PRs seem to be ok w.r.t. this job.

@MarcoRossignoli
Copy link
Member Author

/azp run

@danmoseley
Copy link
Member

@MarcoRossignoli I haven't looked at the failures, but if you think there are infrastructure issues unique to this PR, you could open a nice fresh clean one and close this instead.

@MarcoRossignoli
Copy link
Member Author

I think so, will do!

@MarcoRossignoli
Copy link
Member Author

Replaced in #23591 for CI issue

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants