-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
made Checksum type to truncate byte array given to it if it is bigger than HashSize #33363
Conversation
tagging @jinujoseph and @tmat |
/// </summary> | ||
/// <seealso href="https://bugzilla.xamarin.com/show_bug.cgi?id=60298">LayoutKind.Explicit, Size = 12 ignored with 64bit alignment</seealso> | ||
/// <seealso href="https://github.com/dotnet/roslyn/issues/23722">Checksum throws on Mono 64-bit</seealso> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we dropping these comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since it no longer matters whether the different system uses a different size for SHA1. First, we no longer use SHA1 and for SHA256, we always truncate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the warning still seems applicable though right? Different runtimes will do different lengths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, that it will truncate to Hashsize. so different runtime having different size doesn't matter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment moved to Checksum_Factory
} | ||
} | ||
|
||
public unsafe Checksum(ImmutableArray<byte> checksum) | ||
public static unsafe Checksum From(ImmutableArray<byte> checksum, bool truncate = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is using this and isn't truncating? Don't we use the same hash algorithm everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have something like SourceText that return ImmutableArray rather than byte[].
when we serialize and deserialized checksum, we don't truncate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't truncate because we're actually using the full bits, or we're not truncating because the checksum is already at the right size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is already in right size. basically truncate make it explicit decision when creating checksum that given checksum (byte array) will be truncated.
otherwise, it will throw if given checksum has different size than predefined checksum size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it. Why don't we only throw if we get less than 20 bytes and implicitly truncate when we get more? That is, remove the truncate parameter altogther.
In reply to: 257065181 [](ancestors = 257065181)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to make truncate explicit. but sure. I can make it implicit if that make people feel better.
Co-Authored-By: heejaechang <heejaechang@outlook.com>
Co-Authored-By: heejaechang <heejaechang@outlook.com>
Co-Authored-By: heejaechang <heejaechang@outlook.com>
|
||
private Sha1Hash _checkSum; | ||
private HashData _checksum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HashData [](start = 15, length = 9)
readonly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to replace SHA1 with a high performance non-cryptographic hash on this path. SHA256 will cause observable negative performance for users.
please open new issue on moving to new hash algorithm.
please provide evidence for it. the biggest cost of hashing for roslyn IDE is from source text from compiler, so here whether we use SHA1 or SHA256 won't move the needle much. if you believe otherwise, please provide evidence that that's not true. also, the argument whether SHA1 already slow or not doesn't related to this PR. for such argument, please open new issue discussing using different hash algorithm. |
tagging @jinujoseph your call. |
Lets close this PR for now. |
related to issue - #33411 the issue not about moving SHA1 to SHA256. but related issue of finding faster hash over SHA1 with about same memory footprint + collision resistance. |
resurrected the PR. changed it to general clean up PR. hash is remained same as before and targetting 16.1 |
tagging @dotnet/roslyn-ide |
ping? |
What problem is this solving? |
it allows any size byte array to be given as long as it is bigger than hashsize rather than caller make sure it has right hashsize. |
That's saying waht the change is. What i'm not understanding is: what problem is this solving?
Basically: why do we need this change? What is motivating it? |
due to the inline thing we did saving different size hash is not possible. since FinalizeHashAndReset return new byte array anyway, I am not sure what the inline saves but not looking into that part. |
@heejaechang You didn't answer the questions raised here or here. Specifically, what problem was this attempting to solve. What was wrong with the approach we had in place? |
@CyrusNajmabadi This change is required to be compliant with internal Microsoft policy. |
Why would we not be able to inline a hash of different length? Which hash are we intending on using instead? |
The policy is to not use SHA1. Truncating larger hashes, such as SHA256, preserves the properties of the hash that we care about (probability of collision), it's simple to implement and has the same memory footprint. |
That seems reasonable. |
My understanding is that @heejaechang made this change as a first step - reducing the dependency on SHA1 only to single spot where it can be flipped to SHA256, which is what we currently leaning towards until we decide to use non-crypto alg. |
Gotcha. in hte future, it would be very helpful to include that information in the PR. Otherwise, it just seems like a random change made that doesn't actually seem necessary. Thanks! |
I don't want to connect this to SHA1 or SHA256. it was general clean up to not force checksum to be only SHA1 and certain hashsize. |
@heejaechang It would be really good to explain the "why" of the change, not just the "what". Without proper context and explanation, it just looks unnecessary. |
this doesn't change hash algorithm. it is a general clean up on Checksum type. basically, it now accepts any bytes array as long as it is bigger than HashSize.