Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The property function StableStringHash is relatively weak on some input #7131

Closed
xoofx opened this issue Dec 7, 2021 · 5 comments · Fixed by #9860
Closed

The property function StableStringHash is relatively weak on some input #7131

xoofx opened this issue Dec 7, 2021 · 5 comments · Fixed by #9860
Assignees
Labels
help wanted Issues that the core team doesn't plan to work on, but would accept a PR for. Comment to claim. triaged
Milestone

Comments

@xoofx
Copy link
Member

xoofx commented Dec 7, 2021

Hey,
I'm starting to use the property function StableStringHash $(MSBuildProjectName)-$([MSBuild]::StableStringHash($(MSBuildProjectFile)).ToString("x8")) for hashing a project filepath for the purpose that when you want to output to an intermediate folder that is shared (in a different obj/* location), you need to have a unique filename for the project folder.

But while looking at the generated hashes, I'm seeing a weird pattern in some numbers, where a single character change was not causing enough shuffle of the bits, which indicates that the hash algorithm is weak:

LibChild3_30-5e56f133
LibChild3_31-66f1f133
LibChild3_32-5220f133
LibChild3_33-5abbf133
LibChild3_34-7c62f133
LibChild3_35-84fdf133
LibChild3_36-6f8cf133
LibChild3_4-a48ac640 
LibChild3_5-f243a77b 
LibChild3_6-091903ca 
LibChild3_7-56d1e505 
LibChild3_8-5e70f9dc 
LibChild3_9-ac29db17 

You could see that LibChild3_30 and all up to LibChild3_36 are sharing the same 16 bits f133, while the other below from LibChild3_4 to LibChild3_9 are changing as you would expect.

So first, I found that I made a mistake, and I should have used MSBuildProjectFullPath instead of MSBuildProjectFile... as it was only hashing the filename but not the fullpath (and if we have a same project filename in different folders, we still want this to hash properly the folder)

But still, it doesn't look good at all...

So looking at the code:

internal static int GetHashCode(string fileVersion)
{
unsafe
{
fixed (char* src = fileVersion)
{
int hash1 = (5381 << 16) + 5381;
int hash2 = hash1;
int* pint = (int*)src;
int len = fileVersion.Length;
while (len > 0)
{
hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0];
if (len <= 2)
{
break;
}
hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ pint[1];
pint += 2;
len -= 4;
}
return hash1 + (hash2 * 1566083941);
}
}
}

and hashing simple strings like this:

$([MSBuild]::StableStringHash('10'))   => cdbab78f
$([MSBuild]::StableStringHash('11'))   => cdb9b78f
$([MSBuild]::StableStringHash('100'))  => b7435729
$([MSBuild]::StableStringHash('101'))  => 59eacbc4
$([MSBuild]::StableStringHash('1000')) => 00f35729
$([MSBuild]::StableStringHash('1001')) => 758e5729

And we can see that the algorithm is messing with string with a length that is even here. Hashing 10 vs 11 is generating cdbab78f and cdb9b78f, and just changed by a few bits (!)

I don't know where this algorithm comes from, but it doesn't look good as it is bit shifting+xor, instead of better approaches like xor+multiply_by_prime_number for simple hash. For example it could use FNV-1A and it would provide a much better hash.

But, I assume that now that StableStringHash is out, we cannot really change it right? (as programs are relying on it being stable... 🤔 )

So one possible way to workaround it is to hash(str + hash(str)) and that's probably what I'm gonna do...

@xoofx xoofx added the needs-triage Have yet to determine what bucket this goes in. label Dec 7, 2021
@rainersigwald
Copy link
Member

But, I assume that now that StableStringHash is out, we cannot really change it right? (as programs are relying on it being stable... 🤔 )

I think the contract is that it's stable for a single version of MSBuild and particularly between .NET Core and .NET Framework implementations. I think I'd be comfortable changing the implementation to have better spread.

Though "did we document that sufficiently to actually follow through on breaking it?" is a good question.

@rainersigwald
Copy link
Member

Though "did we document that sufficiently to actually follow through on breaking it?" is a good question.

How convenient: we haven't actually documented it on the public docs page https://docs.microsoft.com/en-us/visualstudio/msbuild/property-functions?view=vs-2022

On the one hand: 😔. But it's convenient here!

@xoofx
Copy link
Member Author

xoofx commented Dec 8, 2021

On the one hand: 😔. But it's convenient here!

hehe, indeed, I found it only from the issue #4986 while looking for a property function hash...

So question is: could we upgrade the StableStringHash to something even better, more like what the hash task is doing (e.g using SHA1) and returning a string? (and so that it would not use anymore the CommunicationsUtilities.GetHashCode()...

A non cryptographic 32bit hash will be always too weak for any kind of uniqueness usage, while using such function in properties is more likely its primary usage (e.g computing a folder name based on a few selected properties)

A cryptographic hash like SHA1 is definitely much slower, but much safer, and I don't expect that we use such functions thousands of time in a single build.

So, could it be e.g SHA1 or should we restrict it to return a single int and move to e.g FNV-1A?

@Forgind Forgind removed the needs-triage Have yet to determine what bucket this goes in. label Jan 6, 2022
@Forgind Forgind added this to the Backlog milestone Jan 6, 2022
@Forgind
Copy link
Member

Forgind commented Jan 6, 2022

Team triage: Looks like this hash isn't used a lot, so this probably isn't high priority.

@Forgind Forgind added the help wanted Issues that the core team doesn't plan to work on, but would accept a PR for. Comment to claim. label Jan 6, 2022
@JaynieBai JaynieBai self-assigned this Sep 19, 2022
@JaynieBai JaynieBai removed their assignment Dec 28, 2022
@JanKrivanek JanKrivanek self-assigned this Nov 3, 2023
@JanKrivanek
Copy link
Member

Waiting for C# Dev Kit 1.3.x to age out (https://marketplace.visualstudio.com/items?itemName=ms-dotnettools.csdevkit), then we'll merge & ship StableStringHash overloads (including the one using FNV)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issues that the core team doesn't plan to work on, but would accept a PR for. Comment to claim. triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants