Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashcode implementation proposal for DataClassificationSet #4933

Merged
merged 8 commits into from
Feb 15, 2024

Conversation

damianhorna
Copy link
Contributor

@damianhorna damianhorna commented Feb 9, 2024

Fixes: #4932

Microsoft Reviewers: Open in CodeFlow

public override int GetHashCode() => _classifications.GetHashCode();
public override int GetHashCode()
{
int hash = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please make this ode conditional of #if NETFRAMEWORK and then add separate code that uses this API on .NET Core?

https://learn.microsoft.com/en-us/dotnet/api/system.hashcode.add?view=net-8.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if maybe we should compute the hash code just once and store it in a field of the DataClassificationSet struct? I worry about the cost of repeated enumeration, and the fact enumeration of a HashSet is not guarantee to return the sequence in the same order every time. Realistically, since the set is not mutated once initialized, it's highly likely that enumeration will always be consistent. But who knows, maybe somebody will change HashSet in some clever way in the future which breaks this.

Copy link
Contributor Author

@damianhorna damianhorna Feb 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for feedback!

About first point - makes perfect sense, I'll rewrite to use HashCode.Add.

About the second point - the fact that HashSet is not guaranteed to return the sequence in the same order should not be much of a problem as long as we use commutative operations to calculate the hashcode (such as XOR) - correct? With those, we should be able to calculate the hashcode in order-independent way.

For performance reasons it would definitely make sense to compute the hashcode just once, since it is used as a key in a dictionary. For that we could consider either lazy hashcode calculation in the GetHashCode method or calculate it during object initialization (in the constructor). Because of thread-safety concerns, I would prefer to calculate it in constructor.

I will send a commit that addresses this - please let me know what you think.

Copy link
Member

@geeknoid geeknoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@geeknoid geeknoid merged commit a8e1751 into dotnet:main Feb 15, 2024
6 checks passed
@xakep139 xakep139 added this to the 8.3 milestone Mar 7, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Apr 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential Equality and Hashing Issues in RedactorProvider with DataClassificationSet Keys in FrozenDictionary
4 participants