Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convenient PRNG construction: seed_from_u64 / from_hashable #522

Closed
dhardy opened this issue Jun 21, 2018 · 18 comments
Closed

Convenient PRNG construction: seed_from_u64 / from_hashable #522

dhardy opened this issue Jun 21, 2018 · 18 comments
Labels
E-question Participation: opinions wanted F-new-int Functionality: new, within Rand
Milestone

Comments

@dhardy
Copy link
Member

dhardy commented Jun 21, 2018

History: when discussing the design of the SeedableRng trait (dhardy#18), we came up with a number of proposals, before settling on the current design. However we always intended on adding another function for convenient generic construction.

Motivation: provide a convenient way to seed any PRNGs via a trait function. This is primarily aimed at users wanting to construct reproducible PRNGs for games and simulations, and is not for password hashing.

Requirements

  • easy to use
  • suitable for generic code
  • allow at least 64-bits of input state/entropy

And non-requirements:

  • performance: we don't expect users to construct large numbers of PRNGs using this mechanism
  • security: this mechanism is not intended for seeding secure PRNGs

Proposals

  • fn seed_from_u64(&mut self, seed: u64) -> Self

    This function is proposed as the simplest option fulfilling all requirements. Quoting @pitdicker (Seeding PRNGs (traits and bit size) dhardy/rand#18 (comment)):

    I think again that a seed_from_u64 function is convenient (from_seed is the 'right' function, but not all that easy to use). Simple deterministic initialization for tests, simulations etc. is something we do not support all that well yet.

    For cryptographic RNGs the u64 can simply be mapped to the first 8 bytes of the seed. For small RNGs small numbers or patterns can start the RNG in a weak state. But the common way I have seen that handled is to just run the RNG the number of times it approximately takes to escape a weak state, usually about 20 times.

    So I would propose to add seed_from_u64 to the SeedableRng trait, with a default implementation. The default implementation will repeat the u64 to fill the seed, initialize the RNG with from_seed, and call next_u32 20 times.

    The advantage of adding the function to SeedableRng is that RNG implementations can override it if for example the number of times to call next_u32 is less (and 0 for cryptographic RNGs).

  • fn from_hashable<T: Hashable>(x: T) -> Self: SeedableRng::Seed and from_hashable dhardy/rand#62; code

    This proposal is very flexible (any PRNG can be constructed from any hashable type), but involves adding a hash function to rand_core (or potentially rand, but this prevents from_hashable being defined on SeedableRng), as well as questions of whether we should use a cryptographic hash function or optimise for speed (neither appears necessary).

    Note that we wish to allow users to reproduce output from PRNGs constructed via this mechanism, thus we cannot rely on std::hash (which is not portable and whose implementation may change).

  • No extra functionality. SeedableRng::Seed already supports Default and AsMut<[u8]>, i.e. it can be constructed with default() and then accessed as &mut [u8]. This is sufficient to construct arbitrary seeds using generic code, but not necessarily convenient.

Note: whatever we go with, it is possible for both of these proposals to be implemented by an external crate, with the only restriction being that this function cannot be added to the SeedableRng trait.


Currently I like @pitdicker's recent suggestion best: add SeedableRng::seed_from_u64 with a default implementation.

@dhardy dhardy added E-question Participation: opinions wanted F-new-int Functionality: new, within Rand X-discussion labels Jun 21, 2018
@pitdicker pitdicker mentioned this issue Jun 21, 2018
28 tasks
@pitdicker
Copy link
Contributor

Thank you for opening the issue and the good description. A couple of months ago I started working on this (code), but it may be easier to start fresh.

@vks
Copy link
Collaborator

vks commented Jun 21, 2018 via email

@dhardy
Copy link
Member Author

dhardy commented Jun 22, 2018

So seed_from_u64 but using SplitMix as a key expansion algorithm / simple hash function with fixed-length input. In theory having some sort of hashing of input is preferable over nothing so that there are no issues when users use simple seeds like 0, 1, 2, etc.

In other words, this is a cut-down version of the from_hashable proposal except that it only accepts u64 as input (which simplifies implementation significantly and makes it clear it is not a crypto hash function).

Also sounds like a good option. Though we could consider using something like SipHash over SplitMix.

@dhardy
Copy link
Member Author

dhardy commented Jun 22, 2018

Lets not forget that we already have an implementation: ::test::rng

pub fn rng(seed: u64) -> TestRng<StdRng> {
    // TODO: use from_hashable
    let mut state = seed;
    let mut seed = <StdRng as SeedableRng>::Seed::default();
    for x in seed.iter_mut() {
        // PCG algorithm
        const MUL: u64 = 6364136223846793005;
        const INC: u64 = 11634580027462260723;
        let oldstate = state;
        state = oldstate.wrapping_mul(MUL).wrapping_add(INC);

        let xorshifted = (((oldstate >> 18) ^ oldstate) >> 27) as u32;
        let rot = (oldstate >> 59) as u32;
        *x = xorshifted.rotate_right(rot) as u8;
    }
    TestRng { inner: StdRng::from_seed(seed) }
}

This is literally all the code we need. Is there any reason that another algorithm, e.g. SplitMix, would be preferably over this? As far as I am aware there are no issues seeding PCG from itself.

@TheIronBorn
Copy link
Contributor

TheIronBorn commented Jun 22, 2018

SplitMix64 is used by Vigna I believe because it never produces consecutive zeros (or any other value), and because it makes the probability of starting from a state with a large fraction of bits set to zero astronomically small. Xorshift and related PRNGs are popular so that may be useful.

@TheIronBorn
Copy link
Contributor

Vigna's dsiutils library provides a setSeed method for SplitMix64 which hashes a u64 seed with MurmurHash.

(It also provides a split method for xoshiro\xoroshiro which hashes existing state with MurmurHash, might be useful for #520)

@dhardy
Copy link
Member Author

dhardy commented Jun 22, 2018

Sounds like a decent approach.

But isn't that site down for you? I haven't been able to access xorshift.di.unimi.it etc. all morning.

So I decided to do a brute force check for the longest zero sequence output by PCG:

$ rustc pcg-zero.rs && ./pcg-zero 
Maximum number of output zeros: 2
First corresponding state: 3458777708003752774
Number of states with this length of zeros: 2

@pitdicker
Copy link
Contributor

SplitMix64 is used by Vigna I believe because it never produces consecutive zeros (or any other value)

Any PRNG with a 64-bit state will produce only one 0u64 in it's period, or 2 consecutive 0u32s.

I agree both SplitMix and PCG could be equally good as a form of seed expansion (I had a small SplitMix helper function in my branch). But this is just a detail. Rust implementations of Xoroshiro and co. can use SplitMix. Other PRNGs can use what is common for those, probably running it a couple of rounds.

@dhardy
Copy link
Member Author

dhardy commented Jun 22, 2018

Any PRNG with a 64-bit state will produce only one 0u64 in it's period, or 2 consecutive 0u32s.

Assuming equidistribution of output, which generally isn't a given.

True; individual PRNGs can provide their own implementation of seed_from_u64 if required, but IMO we should have a good default implementation, and then it is not very useful for other implementations to provide their own (unless wishing to reproduce output of another implementation).

@TheIronBorn
Copy link
Contributor

I used an archived version of the website.

@dhardy
Copy link
Member Author

dhardy commented Jun 23, 2018

If you have a link to the dsiutils package you can share that would be helpful.

It seems that PCG is a better quality generator than SplitMix, yet since we need so little output to fill a seed it doesn't matter much.

If we use PCG and don't pre-process the seed we should discard the first value (since seeds like 0 and 1 produce poor output), though if we only use one byte as in the code above it doesn't matter much.

Hash functions let users turn arbitrary input into a u64 that can be fed into seed_from_u64 if desired — though it isn't ideal to restrict large inputs down to 64-bits of entropy then expand again. So it might be preferable to accept larger inputs (e.g. 256-bit) — but this is supposed to be a convenience construction.

Perhaps then we should

  1. add a simple seed_from_u64 function as proposed
  2. suggest users to use a real hash function like SHA3/Keccak and truncate if necessary to produce a Seed value

@dhardy
Copy link
Member Author

dhardy commented Jun 23, 2018

Rough implementation: dhardy@6ea3f6a

But it might be better to base this on the 0.5 branch (so that we don't release rand_core 0.2.2 from what is 0.6 development code).

@TheIronBorn
Copy link
Contributor

The download isn't archived but the API is: https://web.archive.org/web/20171218222630/http://dsiutils.di.unimi.it:80/docs/

@dhardy
Copy link
Member Author

dhardy commented Jun 24, 2018

But this is just a detail. Rust implementations of Xoroshiro and co. can use SplitMix. Other PRNGs can use what is common for those, probably running it a couple of rounds.

Introducing SeedableRng::seed_from_u64 is not by itself a breaking change. But implementing this with a default implementation and then changing the implementation used by certain PRNGs is a value-breaking change. So if we want to use custom implementations for existing PRNGs perhaps we should hold off publishing in rand_core until Rand 0.6 is ready.

@vks
Copy link
Collaborator

vks commented Jun 25, 2018

Note that using an RNG like splitmix64 for seeding is similar to seeding using a weak hash function:

The mixing functions in a hash are supposed to make sure that flipping a bit in the input results in all output bits changing with probability 0.5. (This is called Avalanche effect). Cryptographically secure hashes make this probability as close as possible to 0.5, while non-CS hashes trade avalanche accuracy for performance. This translates to CSPRNGs and non-CSPRNGs. For example, splitmix64 uses such a weak mixing function that is similar to the one used by MurmurHash3.

In that regard, I think from_hashable is redundant to from_rng. It would make more sense to create a macro that constructs an RNG from a hash, but I don't think this is necessary.

My suggestion is to add a seed_from_u64 with a default implementation using splitmix64. I don't think it makes sense to make it cryptographically secure, because a state of 64 bits is small enough to be brute-forced. Cryptographically secure would make more sense for a hypothetical seed_from_u128, but even then CryptoRng::seed_from_u128(1) is predictable and thus dangerous. I think seed_from_u* should not be implemented for any CryptoRng, so it makes sense to put it into a separate trait or to implement it directly for the RNG without using any traits.

@dhardy
Copy link
Member Author

dhardy commented Jun 25, 2018

seed_from_u64 is definitely not recommended for any cryptographic RNG and there's really no point adding seed_from_u128; it's not the right way to do things. Unfortunately though we can't add seed_from_u64 only for non-crypto RNGs (though maybe we don't want to do that anyway, since there's no reason CSPRNGs can't be used for non-crypto stuff).

@vks
Copy link
Collaborator

vks commented Jun 26, 2018

Unfortunately though we can't add seed_from_u64 only for non-crypto RNGs

Why not? We can't use negative trait bounds, but we can still only implement it for non-crypto RNGs.

though maybe we don't want to do that anyway, since there's no reason CSPRNGs can't be used for non-crypto stuff

That's true, but is it worth the risk that it might be misused? (OTOH, it is not so diffferent from misusing from_seed.)

@dhardy
Copy link
Member Author

dhardy commented Jun 26, 2018

Why not? We can't use negative trait bounds, but we can still only implement it for non-crypto RNGs.

The only definition of a "non-crypto RNG" is a negative bound.

I'm not so concerned about mis-use here: you don't have to know very much about cryptography to know that a 64-bit seed is insufficient. In any case, cryptography should normally be handled by a higher-level library.

@dhardy dhardy added this to the 0.6 release milestone Aug 23, 2018
@dhardy dhardy closed this as completed Mar 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-question Participation: opinions wanted F-new-int Functionality: new, within Rand
Projects
None yet
Development

No branches or pull requests

4 participants