Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashcode implementation proposal for DataClassificationSet #4933

Merged
merged 8 commits into from
Feb 15, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,16 @@ public DataClassificationSet Union(DataClassificationSet other)
/// Gets a hash code for the current object instance.
/// </summary>
/// <returns>The hash code value.</returns>
public override int GetHashCode() => _classifications.GetHashCode();
public override int GetHashCode()
damianhorna marked this conversation as resolved.
Show resolved Hide resolved
{
int hash = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please make this ode conditional of #if NETFRAMEWORK and then add separate code that uses this API on .NET Core?

https://learn.microsoft.com/en-us/dotnet/api/system.hashcode.add?view=net-8.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if maybe we should compute the hash code just once and store it in a field of the DataClassificationSet struct? I worry about the cost of repeated enumeration, and the fact enumeration of a HashSet is not guarantee to return the sequence in the same order every time. Realistically, since the set is not mutated once initialized, it's highly likely that enumeration will always be consistent. But who knows, maybe somebody will change HashSet in some clever way in the future which breaks this.

Copy link
Contributor Author

@damianhorna damianhorna Feb 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for feedback!

About first point - makes perfect sense, I'll rewrite to use HashCode.Add.

About the second point - the fact that HashSet is not guaranteed to return the sequence in the same order should not be much of a problem as long as we use commutative operations to calculate the hashcode (such as XOR) - correct? With those, we should be able to calculate the hashcode in order-independent way.

For performance reasons it would definitely make sense to compute the hashcode just once, since it is used as a key in a dictionary. For that we could consider either lazy hashcode calculation in the GetHashCode method or calculate it during object initialization (in the constructor). Because of thread-safety concerns, I would prefer to calculate it in constructor.

I will send a commit that addresses this - please let me know what you think.

foreach (var item in _classifications)
{
hash ^= item.GetHashCode();
}

return hash;
}

/// <summary>
/// Compares an object with the current instance to see if they contain the same classifications.
Expand Down
Loading