Skip to content

Latest commit

 

History

History
396 lines (365 loc) · 13.6 KB

README.md

File metadata and controls

396 lines (365 loc) · 13.6 KB

(Note that this project predates C# records - but those provide much the same benefits in a more convenient package, so use those instead!)

ValueUtils

Azure DevOps tests Azure DevOps coverage

ValueUtils implements Equals and GetHashCode for you. By using runtime code-generation, the performance overhead is kept small; ValueObject<> generally outperforms alternatives such as Tuple<>, struct or anonymous types (see benchmarks below).

The library is available on nuget (for import or direct download) as ValueUtils. Though it's implemented in C#, it's just as applicable to VB.NET classes.

Contributions welcome! If you've found a bug, are missing a feature, or just have a question, please do create a new github issue, pull request, or send an email to 'eamon (at) nerbonne (dot) org'.

Usage:

The easy way

The easiest way to use value semantics is to derive from ValueObject<>, for example:

using ValueUtils;
sealed class MyValueObject : ValueObject<MyValueObject> {
	public int A, B, C;
	public string X,Y, Z;
// ...
}

A class deriving from ValueObject<T> implements IEquatable<T> and has Equals(object), Equals(T), GetHashCode() and the == and != operators implemented in terms of their fields.

Explicit usage

You can also use delegates (or an IEqualityComparer<>) for hashing and equality comparison for any type (also types in other assemblies you don't control). Given the following example class:

class ExampleClass {
	public string myMember;
	protected readonly DateTime supports_readonly_too;
	int private_int;
// ...
}

The generated hash function can be explicitly used as follows:

using ValueUtils;

Func<ExampleClass, int> hashfunc = FieldwiseHasher<ExampleClass>.Instance;
//or call immediately with type-inference
int hashcode = FieldwiseHasher.Hash(my_example_object);

The generated equality function can be explicitly used as follows:

using ValueUtils;

Func<ExampleClass, ExampleClass, bool> equalityComparer = FieldwiseEquality<ExampleClass>.Instance;
//or call immediately with type-inference
bool areEqual = FieldwiseEquality.AreEqual(my_example_object, another_example_object);

Usage in structs

The above delegates are considerably faster than the built-in ValueType-provided defaults for structs (which use reflection every call), which is why they're a good fit to help implement GetHashCode and Equals for your own structs. Unfortunately, you can't use inheritance to mix in the generated code, so you'll need to use the explicit calls described above. For example:

struct ExampleStruct : IEquatable<ExampleStruct> {
    int some, members, here; 
    //...

    public bool Equals(ExampleStruct other) => FieldwiseEquality.AreEqual(this, other);
    
    public override bool Equals(object obj) => obj is ExampleStruct other && Equals(other);
    
    public override int GetHashCode() => FieldwiseHasher.Hash(this);    
}

Limitations and gotcha's

Cyclical data structures: ValueObject<> supports self-referential types (like tree structures or a singly linked list), but does not support cyclical types - such as a doubly linked list. Whenever a cycle is encountered, the hash function and equals operations will not terminate (until the stack overflows).

Inheritance: Equality is implemented on a per-type basis, and that means inheritance gets confusing. It's OK to have a base class (and base class fields will affect hash and equality), but if you use the base-class's equality and/or hash implementation on a subclass instance the code will seem to work but only consider the fields of the base class. Best practice: don't create sub-classes that add new fields; and if you do then at least never use the base-class equality+hashcode implementations. This is why ValueObject verifies that its subclasses must be sealed.

Lazily constructed internals: FieldwiseHasher and FieldwiseEquality "work" on almost all types, including types with private members in other assemblies - however, if you don't know the internals, you can't be sure what's being included in the equality computations. In particular, if an object is lazily initialized, two semantically equivalent objects might compute as unequal simply because one is initialized and the other is not. In practice this is rarely a problem.

Performance and hash-quality

TL;DR ValueObject<> usually outperforms alternatives such as Tuple<>, struct and anonymous types. Compared to hand-rolled implementations common operations such as .ToDictionary are around 15-25% slower (if your object contains "expensive" data such as large strings, the difference will become a lot smaller).

All performance measurements were done on an i7-4770k at a fixed clock rate of 4.0GHz. Timings are in nanoseconds per object. Datasets are all approximately 3000000 objects in size. Loops over the dataset were repeated until 10 seconds were up, then the fastest quartile average reported (this minimizes interference by other processes on my dev machine since random interference is almost always bad for performance, not good). Some hash generators (notably struct) are so poor that this wasn't feasible, those timings are omitted (NaN) below.

Note that even a perfect hash mix is expected to have 0.03-0.04% colliding buckets, so if you see numbers like that in the data below, a hash if functioning as expected. Numbers better(lower) than that are actually worrisome, because that means some kind of structure in the input is being exploited, and that likely means similar but slightly different data exists that will have lots of collisions. And of course, number much higher that that directly impact performance.

Quite a few tests use a simple pair of ints - this is relevant because this is pretty much a worst case for ValueObject. Although the generated code is fast, calling into that code requires a cast and a Delegate call, and those are (relatively) expensive operations in .NET - at least, compared to simple integer math that a pair-of-ints hashcode requires. With more complicated objects containing reference types the cost of the hashcode computation will start to matter more, and the overhead less.

Realistic scenario with an enum, a string, a DateTime, an int? and 3 int fields.
Name Collisions Distinct Hashcodes .ToDictionary() .Distinct().Count() .Equals() .GetHashCode()
ComplicatedManual 0.04% 2912961 / 2914000 218.8 199.6 6.9 17.4
ComplicatedValueObject 0.04% 2912977 / 2914000 250.2 230.5 21.4 42.1
Tuple 0.03% 2913001 / 2914000 482.2 494.8 257.6 263.5
ComplicatedStruct 100.00% 2 / 2914000 NaN NaN 1002.3 97.2
Anonymous Type 0.03% 2913022 / 2914000 261.0 247.8 31.5 52.9
A simple pair of ints
Name Collisions Distinct Hashcodes .ToDictionary() .Distinct().Count() .Equals() .GetHashCode()
IntPairManual 0.02% 2975318 / 2976000 159.3 133.6 3.8 1.7
IntPairValueObject 0.03% 2974963 / 2976000 199.3 181.9 20.0 16.9
Tuple 38.31% 1835788 / 2976000 353.7 289.2 98.4 54.9
IntPairStruct 56.61% 1291168 / 2976000 864.7 812.6 31.4 36.8
Anonymous Type 4.69% 2836344 / 2976000 185.2 158.2 15.3 13.5
Two ints with both the same value
Name Collisions Distinct Hashcodes .ToDictionary() .Distinct().Count() .Equals() .GetHashCode()
IntPairManual 0.37% 2988915 / 3000000 188.5 155.6 3.6 1.7
IntPairValueObject 0.03% 2999012 / 3000000 200.8 194.6 19.7 16.5
Tuple 22.07% 2337827 / 3000000 145.2 140.5 76.4 55.1
IntPairStruct 100.00% 1 / 3000000 NaN NaN 31.0 36.7
Anonymous Type 0.00% 3000000 / 3000000 144.5 106.3 12.1 13.5
Two ints such that (x,y) is present iif (y,x) is present in the dataset
Name Collisions Distinct Hashcodes .ToDictionary() .Distinct().Count() .Equals() .GetHashCode()
IntPairManual 0.62% 3014881 / 3033584 154.3 140.5 3.6 1.7
IntPairValueObject 0.03% 3032561 / 3033584 192.8 174.6 19.6 17.0
Tuple 41.47% 1775545 / 3033584 457.5 432.1 76.0 54.8
IntPairStruct 74.50% 773500 / 3033584 804.6 775.8 31.0 36.6
Anonymous Type 0.79% 3009536 / 3033584 175.2 161.2 12.1 13.5
A reference to the type itself and two int fields. The dataset contains exactly one level of nesting such that the outer object is (x,y) when the inner is (y,x).
Name Collisions Distinct Hashcodes .ToDictionary() .Distinct().Count() .Equals() .GetHashCode()
NastyNestedManual 24.14% 2267216 / 2988648 225.1 181.5 6.3 5.0
NastyNestedValueObject 0.03% 2987634 / 2988648 239.7 208.8 30.9 33.0
Tuple 57.80% 1261193 / 2988648 489.5 491.8 103.3 132.0