HashIdentity.Structural is a bad choice for Dictionary comparer #574

manofstick · 2015-08-08T04:49:49Z

dict, groupBy & dict all use HashIdentity.Structural for the equality comparer, but it is not a good choice as it only supports partial equality relationship. i.e.

let nans = [| nan; nan |]

let dict1 = System.Collections.Generic.Dictionary System.Collections.Generic.EqualityComparer.Default
nans |> Seq.iter (fun n -> dict1.[n] <- n)

let dict2 = System.Collections.Generic.Dictionary HashIdentity.Structural
nans |> Seq.iter (fun n -> dict2.[n] <- n)

printfn "dict1.Count=%d" dict1.Count
printfn "dict2.Count=%d" dict2.Count
printfn "dict1.ContainsKey nan=%b" (dict1.ContainsKey nan)
printfn "dict2.ContainsKey nan=%b" (dict2.ContainsKey nan)

Which outputs the following

dict1.Count=1
dict2.Count=2
dict1.ContainsKey nan=true
dict2.ContainsKey nan=false

Now I'm not recommending the use of System.Collections.Generic.EqualityComparer.Default as this is obviously deficient in other ways, but rather an ER compatible version.

This would be a breaking change, although I think the current version is confusing, as I think most users would think it would act in the same way as the Default EqualityComparer, and most people would be using it in that fashion.

The text was updated successfully, but these errors were encountered:

dsyme · 2015-08-14T10:15:11Z

It's good to have this documented. However the results you're seeing from dict2 are not entirely unreasonable. NaN's are not equal, so if you store 100 NaNs you get a 100 NaNs - they all count as different. So I don't think a breaking change would be warranted for this at this point.

manofstick · 2015-08-14T10:23:24Z

I wouldn't care either way I guess if this didn't have a performance impact, but it does. It means that PER comparison is used which means that we can't do back to the Iequatable.equals method. And even that I could tolerate if there was a valid reason. But floating point numbers ashould never be used for equality operations (and can and do have strange results on Intel processors given that the underlying calculations occur on 80 bit precision numbers, although they can have equivalent 64 but representations are not equal! Yes this had bitten me in a real system!) So I feel that we are supporting a configuration that is invalid from the beginning...

dsyme · 2015-08-14T13:29:39Z

Hi @manofstick - could you add some perf figures for the performance difference here, especially for non-floating point key types? I need to get a picture for how it compares to the other costs for dict, groupBy and distinctBy. thanks

manofstick · 2015-08-18T10:26:44Z

@dsyme

OK, first example. Trying to be a little bit realistic, I have based this a stackoverflow question by ttsiodras from from 2011. His gist is here. My modified version is here.

	x64_new	x64_old	x86_new	x86_old
custom dynamic	183.17	359.33	168.00	680.33
custom structural	212.00	359.67	200.33	586.50
custom default	181.00	183.00	165.00	164.67

value dynamic	1147.17	1900.50	1294.17	4892.17
value structural	1034.33	1016.00	1099.50	1072.00
value default	702.17	715.50	674.00	668.83

gen value dynamic	2178.00	4107.83	2010.83	9573.00
gen value structural	1829.17	3263.33	2068.67	6922.00
gen value default	1032.17	2509.67	1158.00	4000.50

ref dynamic	2245.17	2287.50	1565.17	4713.17
ref structural	1350.17	1339.50	1243.17	1242.00
ref default	1407.33	1399.50	1356.83	1408.50

gen ref dynamic	3148.67	4307.17	2661.33	9393.33
gen ref structural	2636.50	3984.17	2090.00	7000.33
gen ref default	2058.50	3600.50	1775.50	4789.83

tuple dynamic	498.50	1074.17	462.00	1955.83
tuple structural	428.83	419.33	404.83	405.83
tuple default	1226.17	1200.00	1237.00	1259.00

The figures to not here are the comparison of the "default" lines in comparison to the "structural" lines (or "dynamic"). The reason for this is that "default" is using the EqualityComparers,Default implementation which is calling IEquatable<> which would be available if we had ER equality symantics rather than PER equality semantics.

What should also be noted is that this example is quite dependent on the GetHashCode () implementation (as shown bty the difference between the "Custom" and the standard "Value" type times). And GetHashCode is unaffected by the PER symantics, so the results would be better than those noted.

(The "tuple default" is bad due to their custom treatment by the f# compiler)

So the results for the various types of ER vs PER is between about 70%-50% of the time, which is to my mind quite significant (although the whole things shows that really the right choice of data type is still the best way to get the best results!)

dsyme · 2016-07-12T01:22:55Z

Closing old discussion

dsyme added the discussion label Aug 14, 2015

manofstick mentioned this issue Aug 17, 2015

Removing internal interface usage, and tail calls in "dict" #594

Closed

dsyme closed this as completed Jul 12, 2016

manofstick mentioned this issue Jun 9, 2018

[CompilerPerf] Faster equality in generic contexts #5112

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HashIdentity.Structural is a bad choice for Dictionary comparer #574

HashIdentity.Structural is a bad choice for Dictionary comparer #574

manofstick commented Aug 8, 2015

dsyme commented Aug 14, 2015

manofstick commented Aug 14, 2015

dsyme commented Aug 14, 2015

manofstick commented Aug 18, 2015

dsyme commented Jul 12, 2016

HashIdentity.Structural is a bad choice for Dictionary comparer #574

HashIdentity.Structural is a bad choice for Dictionary comparer #574

Comments

manofstick commented Aug 8, 2015

dsyme commented Aug 14, 2015

manofstick commented Aug 14, 2015

dsyme commented Aug 14, 2015

manofstick commented Aug 18, 2015

dsyme commented Jul 12, 2016