made Checksum type to truncate byte array given to it if it is bigger than HashSize #33363

heejaechang · 2019-02-13T21:16:00Z

this doesn't change hash algorithm. it is a general clean up on Checksum type. basically, it now accepts any bytes array as long as it is bigger than HashSize.

heejaechang · 2019-02-13T21:18:36Z

tagging @jinujoseph and @tmat

src/Workspaces/Core/Portable/Workspace/Solution/Checksum.cs

jasonmalinowski · 2019-02-13T21:59:08Z

src/Workspaces/Core/Portable/Workspace/Solution/Checksum.cs

        /// </summary>
-        /// <seealso href="https://bugzilla.xamarin.com/show_bug.cgi?id=60298">LayoutKind.Explicit, Size = 12 ignored with 64bit alignment</seealso>
-        /// <seealso href="https://github.com/dotnet/roslyn/issues/23722">Checksum throws on Mono 64-bit</seealso>


Why are we dropping these comments?

since it no longer matters whether the different system uses a different size for SHA1. First, we no longer use SHA1 and for SHA256, we always truncate it.

I guess the warning still seems applicable though right? Different runtimes will do different lengths?

sure, that it will truncate to Hashsize. so different runtime having different size doesn't matter

comment moved to Checksum_Factory

src/Workspaces/Core/Portable/Workspace/Solution/Checksum.cs

jasonmalinowski · 2019-02-13T22:05:32Z

src/Workspaces/Core/Portable/Workspace/Solution/Checksum.cs

            }
        }

-        public unsafe Checksum(ImmutableArray<byte> checksum)
+        public static unsafe Checksum From(ImmutableArray<byte> checksum, bool truncate = false)


What is using this and isn't truncating? Don't we use the same hash algorithm everywhere?

we have something like SourceText that return ImmutableArray rather than byte[].

when we serialize and deserialized checksum, we don't truncate.

We don't truncate because we're actually using the full bits, or we're not truncating because the checksum is already at the right size?

it is already in right size. basically truncate make it explicit decision when creating checksum that given checksum (byte array) will be truncated.

otherwise, it will throw if given checksum has different size than predefined checksum size.

I don't get it. Why don't we only throw if we get less than 20 bytes and implicitly truncate when we get more? That is, remove the truncate parameter altogther.

In reply to: 257065181 [](ancestors = 257065181)

I wanted to make truncate explicit. but sure. I can make it implicit if that make people feel better.

Co-Authored-By: heejaechang <heejaechang@outlook.com>

tmat · 2019-02-15T00:38:52Z

src/Workspaces/Core/Portable/Workspace/Solution/Checksum.cs


-        private Sha1Hash _checkSum;
+        private HashData _checksum;


HashData [](start = 15, length = 9)

readonly?

sharwell

We need to replace SHA1 with a high performance non-cryptographic hash on this path. SHA256 will cause observable negative performance for users.

please open new issue on moving to new hash algorithm.

heejaechang · 2019-02-15T01:12:00Z

We need to replace SHA1 with a high performance non-cryptographic hash on this path. SHA256 will cause observable negative performance for users.

please provide evidence for it. the biggest cost of hashing for roslyn IDE is from source text from compiler, so here whether we use SHA1 or SHA256 won't move the needle much. if you believe otherwise, please provide evidence that that's not true.

also, the argument whether SHA1 already slow or not doesn't related to this PR. for such argument, please open new issue discussing using different hash algorithm.

heejaechang · 2019-02-15T01:16:17Z

tagging @jinujoseph your call.

jinujoseph · 2019-02-15T01:30:15Z

Lets close this PR for now.
@davidwengier will get us guidance here and we will take up in 16.1

heejaechang · 2019-02-15T01:41:04Z

related to issue - #33411

the issue not about moving SHA1 to SHA256. but related issue of finding faster hash over SHA1 with about same memory footprint + collision resistance.

heejaechang · 2019-02-15T18:56:30Z

resurrected the PR. changed it to general clean up PR. hash is remained same as before and targetting 16.1

heejaechang · 2019-02-15T18:56:38Z

tagging @dotnet/roslyn-ide

heejaechang · 2019-02-16T10:54:09Z

ping?

CyrusNajmabadi · 2019-02-18T09:29:29Z

this doesn't change hash algorithm. it is a general clean up on Checksum type. basically, it now accepts any bytes array as long as it is bigger than HashSize.

What problem is this solving?

heejaechang · 2019-02-18T23:50:55Z

it allows any size byte array to be given as long as it is bigger than hashsize rather than caller make sure it has right hashsize.

CyrusNajmabadi · 2019-02-19T00:25:40Z

it allows any size byte array to be given as long as it is bigger than hashsize rather than caller make sure it has right hashsize.

That's saying waht the change is. What i'm not understanding is: what problem is this solving?

Do we have a case where we're producing hashes larger than this hashsize?
if we are producing hashes of different size... why not just store the full hash, whatever the size?

Basically: why do we need this change? What is motivating it?

heejaechang · 2019-02-19T04:33:58Z

due to the inline thing we did saving different size hash is not possible. since FinalizeHashAndReset return new byte array anyway, I am not sure what the inline saves but not looking into that part.

CyrusNajmabadi · 2019-02-19T17:53:15Z

@heejaechang You didn't answer the questions raised here or here. Specifically, what problem was this attempting to solve. What was wrong with the approach we had in place?

tmat · 2019-02-19T18:10:17Z

@CyrusNajmabadi This change is required to be compliant with internal Microsoft policy.

CyrusNajmabadi · 2019-02-19T18:14:34Z

@tmat

what is the internal microsoft policy?
why is the internal microsoft policy dictating that we should truncate hashes?

due to the inline thing we did saving different size hash is not possible.

Why would we not be able to inline a hash of different length? Which hash are we intending on using instead?

tmat · 2019-02-19T18:21:26Z

The policy is to not use SHA1. Truncating larger hashes, such as SHA256, preserves the properties of the hash that we care about (probability of collision), it's simple to implement and has the same memory footprint.

CyrusNajmabadi · 2019-02-19T19:07:01Z

The policy is to not use SHA1. Truncating larger hashes, such as SHA256, preserves the properties of the hash that we care about (probability of collision), it's simple to implement and has the same memory footprint.

I don't see a change to move away from sha1. Indeed, the other conversation about which hash to use mentions murmur, which i believe has worse collision properties than sha1. So that appears to be a step back.

it's simple to implement and has the same memory footprint.

That seems reasonable.

tmat · 2019-02-19T19:36:37Z

My understanding is that @heejaechang made this change as a first step - reducing the dependency on SHA1 only to single spot where it can be flipped to SHA256, which is what we currently leaning towards until we decide to use non-crypto alg.

CyrusNajmabadi · 2019-02-19T19:52:25Z

Gotcha. in hte future, it would be very helpful to include that information in the PR. Otherwise, it just seems like a random change made that doesn't actually seem necessary.

Thanks!

heejaechang · 2019-02-19T20:15:42Z

I don't want to connect this to SHA1 or SHA256. it was general clean up to not force checksum to be only SHA1 and certain hashsize.

CyrusNajmabadi · 2019-02-19T20:20:25Z

@heejaechang It would be really good to explain the "why" of the change, not just the "what". Without proper context and explanation, it just looks unnecessary.

replace SHA1 to SHA256

8c8656b

heejaechang requested a review from a team as a code owner February 13, 2019 21:16

tweak

cb58a07

JoeRobich reviewed Feb 13, 2019

View reviewed changes

src/Workspaces/Core/Portable/Workspace/Solution/Checksum.cs Outdated Show resolved Hide resolved

src/Workspaces/Core/Portable/Workspace/Solution/Checksum.cs Outdated Show resolved Hide resolved

src/Workspaces/Core/Portable/Workspace/Solution/Checksum.cs Outdated Show resolved Hide resolved

jinujoseph added Area-IDE Needs Shiproom Approval labels Feb 13, 2019

jasonmalinowski reviewed Feb 13, 2019

View reviewed changes

JoeRobich and others added 3 commits February 14, 2019 11:26

Update src/Workspaces/Core/Portable/Workspace/Solution/Checksum.cs

980eac0

Co-Authored-By: heejaechang <heejaechang@outlook.com>

Update src/Workspaces/Core/Portable/Workspace/Solution/Checksum.cs

25ea34e

Co-Authored-By: heejaechang <heejaechang@outlook.com>

Update src/Workspaces/Core/Portable/Workspace/Solution/Checksum.cs

8624ff1

Co-Authored-By: heejaechang <heejaechang@outlook.com>

tmat reviewed Feb 15, 2019

View reviewed changes

sharwell previously requested changes Feb 15, 2019

View reviewed changes

tweaks based on PR feedback

97641e8

heejaechang added 2 commits February 14, 2019 17:20

fixed merge conflicts

04f2d52

removed unnecessary unsafe

0e1d6e5

heejaechang changed the base branch from dev16.0 to master February 15, 2019 01:27

heejaechang closed this Feb 15, 2019

updated comment

33a29be

heejaechang changed the title ~~replace SHA1 to SHA256~~ made Checksum type to truncate byte array given to it if it is bigger than HashSize Feb 15, 2019

heejaechang reopened this Feb 15, 2019

heejaechang removed the Needs Shiproom Approval label Feb 15, 2019

mavasani approved these changes Feb 17, 2019

View reviewed changes

moved comments around and removed reference to SHA in checksum

a416bd6

heejaechang merged commit ec5be09 into dotnet:master Feb 19, 2019

made Checksum type to truncate byte array given to it if it is bigger than HashSize #33363

made Checksum type to truncate byte array given to it if it is bigger than HashSize #33363

Conversation

heejaechang commented Feb 13, 2019 • edited Loading

heejaechang commented Feb 13, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

heejaechang Feb 15, 2019 • edited Loading

Choose a reason for hiding this comment

tmat Feb 15, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sharwell left a comment

Choose a reason for hiding this comment

heejaechang commented Feb 15, 2019

heejaechang commented Feb 15, 2019

jinujoseph commented Feb 15, 2019

heejaechang commented Feb 15, 2019

heejaechang commented Feb 15, 2019

heejaechang commented Feb 15, 2019

heejaechang commented Feb 16, 2019

CyrusNajmabadi commented Feb 18, 2019

heejaechang commented Feb 18, 2019

CyrusNajmabadi commented Feb 19, 2019

heejaechang commented Feb 19, 2019

CyrusNajmabadi commented Feb 19, 2019

tmat commented Feb 19, 2019

CyrusNajmabadi commented Feb 19, 2019

tmat commented Feb 19, 2019

CyrusNajmabadi commented Feb 19, 2019

tmat commented Feb 19, 2019

CyrusNajmabadi commented Feb 19, 2019

heejaechang commented Feb 19, 2019 • edited Loading

CyrusNajmabadi commented Feb 19, 2019

heejaechang commented Feb 13, 2019 •

edited

Loading

heejaechang Feb 15, 2019 •

edited

Loading

tmat Feb 15, 2019 •

edited

Loading

heejaechang commented Feb 19, 2019 •

edited

Loading