Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

made Checksum type to truncate byte array given to it if it is bigger than HashSize #33363

Merged
merged 10 commits into from
Feb 19, 2019

Conversation

heejaechang
Copy link
Contributor

@heejaechang heejaechang commented Feb 13, 2019

this doesn't change hash algorithm. it is a general clean up on Checksum type. basically, it now accepts any bytes array as long as it is bigger than HashSize.

@heejaechang heejaechang requested a review from a team as a code owner February 13, 2019 21:16
@heejaechang
Copy link
Contributor Author

tagging @jinujoseph and @tmat

/// </summary>
/// <seealso href="https://bugzilla.xamarin.com/show_bug.cgi?id=60298">LayoutKind.Explicit, Size = 12 ignored with 64bit alignment</seealso>
/// <seealso href="https://github.com/dotnet/roslyn/issues/23722">Checksum throws on Mono 64-bit</seealso>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we dropping these comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it no longer matters whether the different system uses a different size for SHA1. First, we no longer use SHA1 and for SHA256, we always truncate it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the warning still seems applicable though right? Different runtimes will do different lengths?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, that it will truncate to Hashsize. so different runtime having different size doesn't matter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment moved to Checksum_Factory

}
}

public unsafe Checksum(ImmutableArray<byte> checksum)
public static unsafe Checksum From(ImmutableArray<byte> checksum, bool truncate = false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is using this and isn't truncating? Don't we use the same hash algorithm everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have something like SourceText that return ImmutableArray rather than byte[].

when we serialize and deserialized checksum, we don't truncate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't truncate because we're actually using the full bits, or we're not truncating because the checksum is already at the right size?

Copy link
Contributor Author

@heejaechang heejaechang Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is already in right size. basically truncate make it explicit decision when creating checksum that given checksum (byte array) will be truncated.

otherwise, it will throw if given checksum has different size than predefined checksum size.

Copy link
Member

@tmat tmat Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it. Why don't we only throw if we get less than 20 bytes and implicitly truncate when we get more? That is, remove the truncate parameter altogther.


In reply to: 257065181 [](ancestors = 257065181)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to make truncate explicit. but sure. I can make it implicit if that make people feel better.

JoeRobich and others added 3 commits February 14, 2019 11:26
Co-Authored-By: heejaechang <heejaechang@outlook.com>
Co-Authored-By: heejaechang <heejaechang@outlook.com>
Co-Authored-By: heejaechang <heejaechang@outlook.com>

private Sha1Hash _checkSum;
private HashData _checksum;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HashData [](start = 15, length = 9)

readonly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

Copy link
Member

@sharwell sharwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to replace SHA1 with a high performance non-cryptographic hash on this path. SHA256 will cause observable negative performance for users.

@heejaechang heejaechang dismissed sharwell’s stale review February 15, 2019 01:06

please open new issue on moving to new hash algorithm.

@heejaechang
Copy link
Contributor Author

We need to replace SHA1 with a high performance non-cryptographic hash on this path. SHA256 will cause observable negative performance for users.

please provide evidence for it. the biggest cost of hashing for roslyn IDE is from source text from compiler, so here whether we use SHA1 or SHA256 won't move the needle much. if you believe otherwise, please provide evidence that that's not true.

also, the argument whether SHA1 already slow or not doesn't related to this PR. for such argument, please open new issue discussing using different hash algorithm.

@heejaechang
Copy link
Contributor Author

tagging @jinujoseph your call.

@heejaechang heejaechang changed the base branch from dev16.0 to master February 15, 2019 01:27
@jinujoseph
Copy link
Contributor

Lets close this PR for now.
@davidwengier will get us guidance here and we will take up in 16.1

@heejaechang
Copy link
Contributor Author

related to issue - #33411

the issue not about moving SHA1 to SHA256. but related issue of finding faster hash over SHA1 with about same memory footprint + collision resistance.

@heejaechang heejaechang changed the title replace SHA1 to SHA256 made Checksum type to truncate byte array given to it if it is bigger than HashSize Feb 15, 2019
@heejaechang heejaechang reopened this Feb 15, 2019
@heejaechang
Copy link
Contributor Author

resurrected the PR. changed it to general clean up PR. hash is remained same as before and targetting 16.1

@heejaechang
Copy link
Contributor Author

tagging @dotnet/roslyn-ide

@heejaechang
Copy link
Contributor Author

ping?

@CyrusNajmabadi
Copy link
Member

this doesn't change hash algorithm. it is a general clean up on Checksum type. basically, it now accepts any bytes array as long as it is bigger than HashSize.

What problem is this solving?

@heejaechang
Copy link
Contributor Author

it allows any size byte array to be given as long as it is bigger than hashsize rather than caller make sure it has right hashsize.

@CyrusNajmabadi
Copy link
Member

it allows any size byte array to be given as long as it is bigger than hashsize rather than caller make sure it has right hashsize.

That's saying waht the change is. What i'm not understanding is: what problem is this solving?

  1. Do we have a case where we're producing hashes larger than this hashsize?
  2. if we are producing hashes of different size... why not just store the full hash, whatever the size?

Basically: why do we need this change? What is motivating it?

@heejaechang
Copy link
Contributor Author

due to the inline thing we did saving different size hash is not possible. since FinalizeHashAndReset return new byte array anyway, I am not sure what the inline saves but not looking into that part.

@heejaechang heejaechang merged commit ec5be09 into dotnet:master Feb 19, 2019
@CyrusNajmabadi
Copy link
Member

@heejaechang You didn't answer the questions raised here or here. Specifically, what problem was this attempting to solve. What was wrong with the approach we had in place?

@tmat
Copy link
Member

tmat commented Feb 19, 2019

@CyrusNajmabadi This change is required to be compliant with internal Microsoft policy.

@CyrusNajmabadi
Copy link
Member

@tmat

  1. what is the internal microsoft policy?
  2. why is the internal microsoft policy dictating that we should truncate hashes?

due to the inline thing we did saving different size hash is not possible.

Why would we not be able to inline a hash of different length? Which hash are we intending on using instead?

@tmat
Copy link
Member

tmat commented Feb 19, 2019

The policy is to not use SHA1. Truncating larger hashes, such as SHA256, preserves the properties of the hash that we care about (probability of collision), it's simple to implement and has the same memory footprint.

@CyrusNajmabadi
Copy link
Member

The policy is to not use SHA1. Truncating larger hashes, such as SHA256, preserves the properties of the hash that we care about (probability of collision), it's simple to implement and has the same memory footprint.

  1. I don't see a change to move away from sha1. Indeed, the other conversation about which hash to use mentions murmur, which i believe has worse collision properties than sha1. So that appears to be a step back.

it's simple to implement and has the same memory footprint.

That seems reasonable.

@tmat
Copy link
Member

tmat commented Feb 19, 2019

My understanding is that @heejaechang made this change as a first step - reducing the dependency on SHA1 only to single spot where it can be flipped to SHA256, which is what we currently leaning towards until we decide to use non-crypto alg.

@CyrusNajmabadi
Copy link
Member

Gotcha. in hte future, it would be very helpful to include that information in the PR. Otherwise, it just seems like a random change made that doesn't actually seem necessary.

Thanks!

@heejaechang
Copy link
Contributor Author

heejaechang commented Feb 19, 2019

I don't want to connect this to SHA1 or SHA256. it was general clean up to not force checksum to be only SHA1 and certain hashsize.

@CyrusNajmabadi
Copy link
Member

@heejaechang It would be really good to explain the "why" of the change, not just the "what". Without proper context and explanation, it just looks unnecessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants