Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random.NextBytes is very slightly biased against 0xFF #41457

Closed
Clockwork-Muse opened this issue Aug 27, 2020 · 5 comments · Fixed by #47085
Closed

Random.NextBytes is very slightly biased against 0xFF #41457

Clockwork-Muse opened this issue Aug 27, 2020 · 5 comments · Fixed by #47085

Comments

@Clockwork-Muse
Copy link
Contributor

NextBytes(byte[]) is implemented like so:

public virtual void NextBytes(byte[] buffer)
{
    if (buffer == null) throw new ArgumentNullException(nameof(buffer));
    for (int i = 0; i < buffer.Length; i++)
    {
        buffer[i] = (byte)InternalSample();
    }
}

(side note: why no unchecked on the cast...?)

The Span<byte> version has the same effective implementation if Next() has not been overridden.

The problem is that InternalSample() returns values in the range [0, int.MaxValue) - that is, int.MaxValue is not included. This skews the distribution of possible values for bytes:

  • 0x00-0xFE - 8388608 values
  • 0xFF - 8388607 values

(Sample generation program)

The catch is that I'm not sure that this is fixable, given that any fix for this would be a breaking change in the behavior, and we've been reluctant to do so for other parts of Random.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Aug 27, 2020
@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@GrabYourPitchforks
Copy link
Member

Possible dupe of #23198.

@Clockwork-Muse
Copy link
Contributor Author

@GrabYourPitchforks - I mean, that issue is about the fact that there's a deviation in the algorithm used to generate what InternalSample() returns.

This one is more about how there's a slight bias due to the fact that InternalSample() doesn't return int.MaxValue. That is, even if there was no deviation in the algorithm, there would still be this issue.

Although yes, it may end up being a dupe due to the resolution. I just wanted this explicitly called out.

@joperezr joperezr added the bug label Aug 27, 2020
@joperezr joperezr added this to the Future milestone Aug 27, 2020
@joperezr joperezr removed the untriaged New issue has not been triaged by the area owner label Aug 27, 2020
@GrabYourPitchforks
Copy link
Member

Ah, yes, you're right in that it's not an exact dupe. But the resolution would be the same. Either they're all resolved as "by design", or they're fixed all at once.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 20, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 21, 2021
@Clockwork-Muse
Copy link
Contributor Author

Note to future readers:

The fix implemented at this time only affects randoms created without an explicit seed. Due to compatibility concerns, randoms initialized with a seed retain the old behavior.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants