Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reseeding std::rand. #722

Closed
wants to merge 7 commits into from
Closed

Reseeding std::rand. #722

wants to merge 7 commits into from

Conversation

huonw
Copy link
Member

@huonw huonw commented Jan 23, 2015

Stabilise the std::rand module by focusing it on a smaller set of
functionality, moving "fancy" functionality to a crates.io crate.

@huonw
Copy link
Member Author

huonw commented Jan 23, 2015

cc @sfackler, @nagisa, @aturon, @alexcrichton

@aturon
Copy link
Member

aturon commented Jan 23, 2015

Rendered

/// mediated by information contained in `Data`.
trait Random<Data = FullRange> {
/// Create a random value of type `Self`
fn random<R: ?Sized + RngBase>(data: Data, rng: &mut Rng) -> Self;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rng should be R.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RngBase should be Rng I think.

@huonw
Copy link
Member Author

huonw commented Jan 23, 2015

Notes to self; extra alternatives (will incorporate them into the document in the near future):

  • we could take the collections approach and not have a generic Rng/RngExt trait in favour of inherent impls on the types. (Seems unfortunate to have to reimplement sample several times, although I guess internal implementation details could avoid this problem.)

  • we could merge Rng/RngExt with one of the "object safety tricks" (if they were implemented), e.g.

     trait Rng {
          fn sample<...>(...) where Self: Sized // (or Self != Rng)
     }

@sfackler
Copy link
Member

WRT the lack of collections traits, the only reason that they don't exist is that we can't define them in a reasonable way without HKT. Unless the same is true here, I'd prefer to keep the traits.

a program to have forcibly deterministic execution by seeding the
`ThreadRng`. Changing this is backwards compatible.

This may make more sense to exist in `std::thread` or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right spot for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This == std::rand?

@aturon
Copy link
Member

aturon commented Jan 23, 2015

This is really nice!

I'm slightly hesitant about the Data component here, in two respects:

  1. The .. form is mildly annoying and, as you say, may not make sense for all types;
  2. The point you make at the end about the surprising inefficiency of integer ranges.

Would it be possible to provide both ranged and non-ranged API variants while still performing some of the unifications of this RFC?

@petrochenkov
Copy link
Contributor

You may consider using the term RBG (random bit generator) instead of RNG (random number generator) to avoid a common source of confusion. Here's the explanation http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3847.pdf from the author of <random>, but it was too late for C++ to make this change.

```

As today, the relationship between output of the methods of `Rng`, and
between them and `io::Reader` is unspecified.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/, and between them//?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean to say that the relationship between next_u32, next_u64, next_f32, next_f64 and Reader::read is pairwise unspecified (I thought "between the methods of Rng and Reader" was ambiguous, as it doesn't clearly say anything about intra-Rng connections.)

like seeding a hashmap to avoid algorithmic complexity DoS attacks
does not require a (possibly expensive) call into the operating system
for every `HashMap::new()` call; a high-quality user-space RNG with an
unpredictable seed (e.g. from the OS) is almost certainly good enough.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it sound like the operating system's random number generator is somehow inherently better than a userspace one can be, which is totally untrue.

@thestinger
Copy link

I think it's irresponsible to attempt to pass the responsibility of implementing a CSPRNG on to someone else, when it's known that the implementation people are being told to use is less secure and extremely slow. In fact, it's such a bad practice that it is directly violating the guidelines given by OpenBSD for the usage of the OS random-number generator.

If there's no claim that the userspace RNG is cryptographically secure, then it can't be used for hash tables without documenting that they have no guarantees of protection against a DoS attack. It's a vulnerability if the RNG is not regarded as a CSPRNG.

@thestinger
Copy link

The fact that the OS random number generators don't scale at all means that Rust will have nothing available in many cases. Typical usage of cryptography in software is very performance critical, and if it's not fast enough then it won't be used. There's a reason that performance is one of the most important aspects in the design / choice of cryptographic primitives.

The OS RNG on Linux is not designed by cryptographers or based on proven cryptographic primitives. It is poorly understood, and the few people who have done real cryptanalysis have found significant weaknesses. The argument to an authority in the kernel is an empty one. In fact, the kernel intentionally eschews proven cryptography based on what is essentially an attempt at "security through obscurity".

@thestinger
Copy link

One example, finding that the entropy mixing / estimation in the Linux kernel is counter-productive from a security perspective and that the hand-rolled cryptography it uses is no good:

http://eprint.iacr.org/2013/338.pdf

@bill-myers
Copy link

I think that Rust should not specify a specific cipher for the default RNG. For example on CPUs with hardware AES instructions using AES instead of ChaCha20 for the cipher might be faster while ChaCha20 or other ciphers might be preferred elsewhere.

It's also nice to be able to configure a Rust program to run with a deterministic seed without changing it (perhaps even with an environment variable), so exposing the fact that the userspace RNG is seeded from OsRng seems a bad idea as well.

For this reason, as well as the fact that an unbuffered /dev/urandom should not be used for anything than seeding another userspace RNG, it might make sense to also not expose OsRng.

Overall, it seems the best solution might be to just expose a single opaque "DefaultRng" that is a thread-local ChaCha20/AES/... RNG that is seeded from a global ChaCha20/AES/... RNG (or possibly from the thread-local RNG in the parent thread) that is seeded from /dev/urandom or from a seed specified either at program initialization or in an environment variable.

Plus explicit RNGs with deterministic seeding, but maybe those don't really need to be in the standard library.

@huonw
Copy link
Member Author

huonw commented Jan 28, 2015

@bill-myers what do you mean by "default RNG" and "the userspace RNG"? Based on your second last paragraph the ThreadRng?

@Gankra
Copy link
Contributor

Gankra commented Jan 28, 2015

CC @apoelstra

@huonw
Copy link
Member Author

huonw commented Jan 28, 2015

@thestinger thank you for pointing out the poor motivation I'd given the design written here. The discussion of cryptography was particularly ungreat. The use of the operating system was meant to be a default position, since no other generator in std::rand has yet been properly verified as correct (it would not be hard to verify ChaChaRng, though).

A better motivation, inline with the designs for the rest of std, would be: provide a small set of (mostly) opinionated primitives, OsRng (interface to the "best" operating system source of randomness), ChaChaRng (a strong userspace RNG that will be suitable for cryptography once verified) and XorshiftRng (a very fast userspace RNG suitable for applications that just want all out speed). More interesting functionality can be progressively built on this, e.g. adaptors that reseed, adaptors that detect fork and handle it, adaptors that buffer. The details of ThreadRng can definitely do with some tweaking; I think I overspecified them for the purposes of this RFC.


However, I'm moving more and more towards a version of @bill-myers's final suggestion: move the whole of std::rand out of the standard library, making the small bits std needs (hashmap seeding, benchmarks) just internal implementation details.

@huonw
Copy link
Member Author

huonw commented Jan 28, 2015

Doing the latter gives us much more flexibility to iterate on the nits in the design of the Rng trait etc, and to adopt desirable future language features such as inclusive range (makes more sense to use rng.gen('a'...'z') than rng.gen('a'..'}')).

@huonw
Copy link
Member Author

huonw commented Jan 28, 2015

(The intention would be to adopt the design and feedback here in the move, where relevant, but there would be dramatically less need for deletions.)

@thestinger
Copy link

I think a fast CSPRNG is a must-have because otherwise the easiest way to get random data will be by calling the standard C library via FFI, despite the many caveats. Stripping away functionality like the range generation due to backwards compatibility concerns would encourage people to roll their own code and it will likely be wrong. In this case it's better to be stuck with a few imperfect APIs forever than widespread misuse of random numbers across the ecosystem.

It's important that it's fast (unlike the OS RNGs) because that's another reason to use a non-cryptographically secure RNG even though it should be fast enough for nearly all use cases. This is why arc4random on OpenBSD and the other operating systems that have cloned it is so compelling.

I think the risks of bad practices with random data spreading throughout the ecosystem far outweigh the tiny risk that you've implemented it incorrectly. I really doubt that cryptographers could do a better job implementing a clear cut algorithm like ChaCha20 that's explicitly designed to avoid any implementation issues by avoiding branches and table lookups (unlike algorithms like AES that are super hard to do correctly). It's a really overblown risk and it shouldn't drive the design process.

The risk for all the snippets of unsafe code is far greater and no one expects a special class of professional to be the only ones touching that. No one expects someone with a degree in data structures to be the only one able to implement a B-tree. It seems to me that the real danger is someone who thinks they do know what they're doing, so they do more than very carefully implementing proven algorithms. Having cryptographers doing the programming isn't a clear recipe for success: just look at OpenSSL.

@huonw
Copy link
Member Author

huonw commented Jan 28, 2015

I agree that it is important to provide tools to allow users to not shoot themselves in the foot, but the intention after moving it out of std would be improving it until it is good enough to move back in (unless we find that having it outside std works fine). It seems somewhat possible to do this before 1.0 itself, but there's also a non-zero chance that it won't be possible to do it nicely by then, so moving it out ensures flexibility in the worst case. (At the very least, we can move out, reevaluate the situation in a month, and possibly move selected portions back in.)

Lastly, other "core" functionality will only be available via crates.io (and their git repos) at 1.0, so accessing libraries there will be the norm, and functionality like rust-lang/cargo#4 will make it even easier to load them: what is now use std::rand; turns into just cargo add rand extern crate rand;.

@huonw
Copy link
Member Author

huonw commented Jan 31, 2015

Closing as discussed. I'm to have the removal happening in the next few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.