-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The property function StableStringHash is relatively weak on some input #7131
Comments
I think the contract is that it's stable for a single version of MSBuild and particularly between .NET Core and .NET Framework implementations. I think I'd be comfortable changing the implementation to have better spread. Though "did we document that sufficiently to actually follow through on breaking it?" is a good question. |
How convenient: we haven't actually documented it on the public docs page https://docs.microsoft.com/en-us/visualstudio/msbuild/property-functions?view=vs-2022 On the one hand: 😔. But it's convenient here! |
hehe, indeed, I found it only from the issue #4986 while looking for a property function hash... So question is: could we upgrade the StableStringHash to something even better, more like what the hash task is doing (e.g using SHA1) and returning a string? (and so that it would not use anymore the A non cryptographic 32bit hash will be always too weak for any kind of uniqueness usage, while using such function in properties is more likely its primary usage (e.g computing a folder name based on a few selected properties) A cryptographic hash like SHA1 is definitely much slower, but much safer, and I don't expect that we use such functions thousands of time in a single build. So, could it be e.g SHA1 or should we restrict it to return a single int and move to e.g FNV-1A? |
Team triage: Looks like this hash isn't used a lot, so this probably isn't high priority. |
Waiting for C# Dev Kit 1.3.x to age out (https://marketplace.visualstudio.com/items?itemName=ms-dotnettools.csdevkit), then we'll merge & ship |
Hey,
I'm starting to use the property function
StableStringHash
$(MSBuildProjectName)-$([MSBuild]::StableStringHash($(MSBuildProjectFile)).ToString("x8"))
for hashing a project filepath for the purpose that when you want to output to an intermediate folder that is shared (in a differentobj/*
location), you need to have a unique filename for the project folder.But while looking at the generated hashes, I'm seeing a weird pattern in some numbers, where a single character change was not causing enough shuffle of the bits, which indicates that the hash algorithm is weak:
You could see that
LibChild3_30
and all up toLibChild3_36
are sharing the same 16 bitsf133
, while the other below fromLibChild3_4
toLibChild3_9
are changing as you would expect.So first, I found that I made a mistake, and I should have used
MSBuildProjectFullPath
instead ofMSBuildProjectFile
... as it was only hashing the filename but not the fullpath (and if we have a same project filename in different folders, we still want this to hash properly the folder)But still, it doesn't look good at all...
So looking at the code:
msbuild/src/Shared/CommunicationsUtilities.cs
Lines 655 to 682 in cd95fa9
and hashing simple strings like this:
And we can see that the algorithm is messing with string with a length that is even here. Hashing
10
vs11
is generatingcdbab78f
andcdb9b78f
, and just changed by a few bits (!)I don't know where this algorithm comes from, but it doesn't look good as it is bit shifting+xor, instead of better approaches like xor+multiply_by_prime_number for simple hash. For example it could use FNV-1A and it would provide a much better hash.
But, I assume that now that
StableStringHash
is out, we cannot really change it right? (as programs are relying on it being stable... 🤔 )So one possible way to workaround it is to
hash(str + hash(str))
and that's probably what I'm gonna do...The text was updated successfully, but these errors were encountered: