Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future of thread_rng #463

Closed
dhardy opened this issue May 17, 2018 · 36 comments
Closed

Future of thread_rng #463

dhardy opened this issue May 17, 2018 · 36 comments
Labels
D-changes Do: changes requested
Milestone

Comments

@dhardy
Copy link
Member

dhardy commented May 17, 2018

Status: proposal to allow hardware-dependent generators and replace HC128 with a faster variant of ChaCha (start reading here).


This topic comes up quite a bit from various angles; I think it's time to get some ideas down about the future of thread_rng (regarding the 0.6 release or later).

I see the following potential uses for thread_rng:

  • convenient random numbers in small games and simulations
  • use in randomised algorithms
  • as a source of secret data for various cryptographic applications

Also, I think it's worth mentioning where we do not expect thread_rng to be used:

  • anywhere reproducibility is required
  • complex simulations and games, due to above point, and also since thread_rng may not be the fastest option
  • where security is very important

And an important note on security: we should aim to provide a secure source of random data, but ultimately it is up to users to decide how much they trust our implementation and what their risks are. thread_rng does not have the simplest code to review and is currently young and subject to further change. Also we may or may not implement forward secrecy (backtracking resistance), and for ultimate security solutions using no local state may be preferred.


Our current implementation of thread_rng tries to satisfy the above with a fast, secure PRNG, but at the cost of high memory usage and initialisation time per thread. For applications with a low number of long-running threads this is reasonable, but for many worker threads may not be ideal.

There are two ways we can let users influence the implementation:

  • via feature flags
  • at the call site (pass parameters to thread_rng or call a different function)

Feature flags allow configuration on a per-application basis, e.g.

  • prefer low-memory-usage / fast initialisation over generation performance
  • disable entirely (i.e. panic on usage)
  • require forward secrecy
  • remove security requirements (??)

The last two options sound very risky to me — should we ask distributors and end-users to reason about the security of whole applications? It is quite possible that the people building applications — even developers — will not know about all uses of thread_rng requiring secure randomness.

This brings me to ask, is having only a single user-facing function ideal? What if instead:

  • strong_rng replaces thread_rng as a source of cryptographically secure randomness
  • weak_rng is added as a fast source of randomness; depending on feature flags this could just wrap strong_rng or could be independent

An advantage of the above is that feature-flags could allow replacing the current implementation (HC-128; 4176 bytes) with two smaller back ends (e.g. ChaCha + PCG; 136 + 16 bytes), while only compromising the speed of the secure generator.

Another advantage is that we could add forward-secrecy to strong_rng with less concern for performance implications.

But first, why bother when generators like Randen and RDRAND claim to satisfy all requirements anyway? This is a good question to which I only have vague answers: Randen is still new and unproven and may have portability issues; RDRAND is not fully trusted; and may not be the fastest option.

Second, what about users like HashMap where weaknesses are often not exploitable (depending on application design and usage) and in the worst case only allow DOS attacks (slow algorithms)? Another good question. One possible answer is that these use-cases should use weak_rng but by default this would be secure anyway; we provide feature flags to change that but discourage usage on servers. It might seem tempting to add a third function, but, frankly, this kind of thing is probably the main use case for weak_rng anyway.

Another, very different, option is that we keep thread_rng looking like it is but remove CryptoRng support and recommend it not be used for crypto keys. Then we can add a feature flag changing its implementation to an insecure generator with less concern. This may be a good option, but goes against our recent changes (switching to HC-128 and implementing CryptoRng).


BTW, lets not bikeshed thread_rng vs ThreadRng::new or other syntax here.

@dhardy dhardy added E-question Participation: opinions wanted X-discussion labels May 17, 2018
@vks
Copy link
Collaborator

vks commented May 17, 2018

Using feature flags can affect dependencies, right? This might break assumptions of dependencies, so this seems dangerous. I would prefer to use different types instead and require the dependencies to be generic in the type of thread RNG.

Instead of providing several variants of ThreadRng, maybe it makes more sense to provide one conservative default and make it easy to construct custom ThreadRng (i.e. by making it generic in the RNG). Users who know the tradeoffs could pick the RNG best suited for their use case, maybe guided by a few recommendations in Rand.

I think HashMap should surely use a CSPRNG, otherwise it will likely be open to DOS attacks by an adversary who can observe random numbers (which would be a very plausible scenario for instance in a multiplayer game).

@dhardy
Copy link
Member Author

dhardy commented May 17, 2018

Yes, feature flags can affect dependencies, which is why I suggested what I did above.

Using a custom version of ThreadRng does not help reduce memory usage or per-thread initialisation time when dependencies are already using thread_rng.

DOS attacks

This is why I would not recommend using such a feature flag on servers. Interesting idea trying to use such an attack in multiplayer games, but there is no benefit for synchronous-step models (common in RTS) or attacking the server in server-centric models (common for FPS), and even if it were achievable vs other players the consequences are not high.

Going back to my proposal, I prefer fast_rng over weak_rng:

  • strong_rng for anything crypto
  • fast_rng for stuff like HashMap which by default is still a CSPRNG, but can be changed using a feature flag

@vks
Copy link
Collaborator

vks commented May 18, 2018

Using a custom version of ThreadRng does not help reduce memory usage or per-thread initialisation time when dependencies are already using thread_rng.

I tried to argue that dependencies should make it possible for the user to choose the RNG, which wouldn't have this problem. If the thread RNG is not exposed via the API, it is an implementation detail and I think it should not be affected by feature flags, because it can have unintended consequences.

@dhardy
Copy link
Member Author

dhardy commented May 18, 2018

Related to this, there are security trade-offs to decide:

  • Should we include fork protection, and if so should we check for fork on each value retrieved, or only when refreshing the buffer? According to @pitdicker this is 10-20% overhead vs 1-2% overhead.
  • Should we have forward secrecy, and if so should we immediately clear the buffer (high overhead) or should forward secrecy only protect past output after the next buffer refresh (moderate to low overhead, depending on RNG)?
  • Should we zero memory on drop? Obviously this is not guaranteed to happen and provides much less protection than forward secrecy.
  • Should we combine output with a fast external RNG (RDRAND) where available? I don't know how much overhead this would have, but it would greatly reduce the need to apply fork-protection and forward secrecy for each value output.

Given independent strong_rng and fast_rng these protections can be applied only where necessary, although if the two use the same back-end that is not the case.

@dhardy
Copy link
Member Author

dhardy commented May 18, 2018

@vks I don't get what you mean; you're saying use rand = { version = "0.5", features = ... }? That is feature flags. Or you're saying don't include a thread_rng function / state but only a macro or something allowing one to set up one's own version easily? But then each dependency would add its own thread-local generator, increasing memory usage even further. All dependencies have to use the same back-end(s) to avoid high memory usage (where several libs use Rand within an app).

@sicking
Copy link
Contributor

sicking commented May 20, 2018

I agree that having a fast_rng() and a secure_rng() (or strong_rng) seems like a good way to go.

Based on that I'd be tempted to provide very few security guarantees for fast_rng(). It's generally hard to get security from features that aren't designed for security, and I think it's great if we can optimize fast_rng() for performance rather than security.

I don't know enough about CSPRNG to have opinions about what guarantees that secure_rng() should provide.

On the topic of feature flags: Having feature flags which changes what guarantees fast_rng() and secure_rng() provides sounds very risky to me. That could result in someone wanting to relax security for library/feature X accidentally breaking security also for feature Y.

A better approach would be to encourage libraries which use rand to include feature flags which makes them use different Rng implementations to provides different performance/security trade-offs.

@dhardy
Copy link
Member Author

dhardy commented May 20, 2018

While I partly agree with you @sicking it is important to consider uses like in HashMap which need unpredictable generators in some cases, but also fast ones. I think this is the original motivation for thread_rng.

I only see two solutions for this:

  • a fast & unpredictable generator like the current thread_rng
  • or a fast generator with opt-in or opt-out security via feature flags

@pitdicker
Copy link
Contributor

A few comments on the first post, but I haven't though everything through yet...

Our current implementation of thread_rng tries to satisfy the above with a fast, secure PRNG, but at the cost of high memory usage and initialisation time per thread. For applications with a low number of long-running threads this is reasonable, but for many worker threads may not be ideal.

I am really not that concerned with the memory usage. While 4kb is a lot, it is also really not that much for any system that the standard library is available on.

The init time is something to worry about a bit more. On my PC it takes 5265 ns to initialize ThreadRng, of which only 400 ns are spend by OsRng. That is about 200.000 initializations per second, which I agree seems quite reasonable for long-running threads.

For the situation with many worker threads, isn't it better to use the scheme I ended up with of splitting an RNG using a wrapper such as SplittableRng (#399 (comment))? Also one lesson to take away from that thread was, that the overhead of TLS really becomes a big part of the performance of a fast PRNG such as Xorshift.

Performance with thread_rng, with caching:

test thread_rng_u32        ... bench:       2,887 ns/iter (+/- 28) = 1385 MB/s
test thread_rng_u64        ... bench:       4,320 ns/iter (+/- 26) = 1851 MB/s

Performance with thread_rng, without caching ThreadRng:

test thread_rng_u32        ... bench:       7,916 ns/iter (+/- 34) = 505 MB/s
test thread_rng_u64        ... bench:       8,593 ns/iter (+/- 58) = 930 MB/s

Performance with ReseedingRng<Hc128Core, ...> replaced by XorShiftRng:

test thread_rng_u32        ... bench:       6,323 ns/iter (+/- 13) = 632 MB/s
test thread_rng_u64        ... bench:       6,698 ns/iter (+/- 62) = 1194 MB/s
test gen_u32_xorshift      ... bench:         952 ns/iter (+/- 9) = 4201 MB/s
test gen_u64_xorshift      ... bench:       1,991 ns/iter (+/- 15) = 4018 MB/s

Retrieving the RNG from TLS greatly dominates the cost here.

The last two options sound very risky to me — should we ask distributors and end-users to reason about the security of whole applications? It is quite possible that the people building applications — even developers — will not know about all uses of thread_rng requiring secure randomness.

Yes, that is a big argument. It is the call site that determince whether ThreadRng is good enough to use. A scheme where the application is able to weaken the guarantees seems unacceptable to me.

On the other hand, we already reserve the right to change the algorithm of StdRng to another secure one. So making it partly configurable should not be a problem.

This brings me to ask, is having only a single user-facing function ideal? What if instead:

  • strong_rng replaces thread_rng as a source of cryptographically secure randomness
  • weak_rng is added as a fast source of randomness; depending on feature flags this could just wrap strong_rng or could be independent

Renaming thread_rng will be a very big breaking change, better to keep the name as it is in my opinion (or split that out into another discussion someday). And I still really dislike the name weak_rng. But I understand this is not about names, but about offering two kinds of functionality.

Still, having a thread-local variant using a fast RNG does not offer much advantage in the common case unless the RNG is cached.

But first, why bother when generators like Randen and RDRAND claim to satisfy all requirements anyway? This is a good question to which I only have vague answers: Randen is still new and unproven and may have portability issues; RDRAND is not fully trusted; and may not be the fastest option.

The performance of RDRAND, for comparison (1 value per iteration, instead of 1000 in the previous benchmarks):

test bench_u32 ... bench:          27 ns/iter (+/- 0) = 148 MB/s
test bench_u64 ... bench:          27 ns/iter (+/- 0) = 296 MB/s

RDRAND is 5-10× slower that the current ThreadRng.

Another, very different, option is that we keep thread_rng looking like it is but remove CryptoRng support and recommend it not be used for crypto keys.

Would you really recommend thread_rng for generating keys? At least for now we should be very careful before recommending that.

@dhardy
Copy link
Member Author

dhardy commented May 21, 2018

But what is the CryptoRng trait for, if not to say the generator is acceptable for cryptography?

As I said in the first post, ultimately such usage is about trust and risk, but the CryptoRng trait signals an intention to be secure. I agree, we should definitely not make a generator marked by this configurable to be insecure, but is it worth going the other way? Possibly not; just leaving thread_rng as-is is another option.

@sicking
Copy link
Contributor

sicking commented May 22, 2018

I feel like there are two very common use cases which I think would be great to have very easy-to-access APIs for:

  1. A strong cryptographic rng. Useful for generating authentication tokens, temporary passwords, etc. I.e. must not be guessable in any form. As previously stated, I don't know enough about CSPRNG to provide more detailed requirements here.
  2. A fast rng. Primary feature is that this generates values fast. But using this API should also generate a stream of values which is very unlikely to match the stream from any other run. I.e. should have a decent sized seed and be seeded from a source which is unlikely to generate duplicate values. However if it's hard-but-possible to guess future values based on a large enough set of old values, this is not a big deal. Useful for single-person games and simulations.

I was thinking 1 was secure_rng() and 2 was fast_rng(), but maybe that was wrong. In particular it might be better to encourage another API for 2 since something like fast_rng() requires a TLS access.

Maybe what I'm asking for for 2 is more dhardy#60 plus an empty ::new() function which seeds the returned object from something like secure_rng() or thread_rng()?

And to be clear, I think there are many uses of rng which does not fall into either of these categories. For these I think having APIs which are explicit about which RNG algorithm is used and what the source of seed is the way to go. That way developers can choose whatever algorithm provides the tradeoffs that match their requirements.

@sicking
Copy link
Contributor

sicking commented May 22, 2018

I don't know if HashSet falls into category 2 or not. Based on the fact that it currently uses a simple counter (though starting at a more strongly generated random number) makes me think that 2 is probably fine. But I haven't done enough research to be sure.

However the point is that if 2 is not fine for HashSet then it can use the solution from the last paragraph in the previous comment.

Though in reality I suspect that specifically HashSet won't use anything we're doing here since it's part of std.

@dhardy
Copy link
Member Author

dhardy commented May 22, 2018

Category 2 (simply fast) is actually pretty easy. Our current thread_rng is very fast (as long as the handle is cached), though sometimes SmallRng will be faster (we say specifically in the docs that benchmarking is important since performance is context dependent).

But when it comes to seeding a hash algorithm for HashMap and HashSet, the std currently uses an internal copy of the old thread_rng using the ISAAC generator, which is fast and possibly (hopefully) secure. This is because if the seed is known, an attacker can cause denial-of-service by causing a server to store many items in a single hash-bucket (which thus get stored in a simple list). This could instead be solved with better algorithms (e.g. a tree-based map), but these are usually sometimes slower (actually, I think hash-maps get over-used relative to their performance).

@vks
Copy link
Collaborator

vks commented May 22, 2018

@sicking

A fast rng. Primary feature is that this generates values fast.

I don't think this is well-defined. You can usually make RNGs faster by increasing the size of the state, because it allows for more instruction parallelism. Also, in which regard should it be fast? Initialization? Generating u64? Filling byte buffers?

@vks
Copy link
Collaborator

vks commented May 22, 2018

@dhardy

I don't get what you mean; you're saying use rand = { version = "0.5", features = ... }? That is feature flags. Or you're saying don't include a thread_rng function / state but only a macro or something allowing one to set up one's own version easily? But then each dependency would add its own thread-local generator, increasing memory usage even further. All dependencies have to use the same back-end(s) to avoid high memory usage (where several libs use Rand within an app).

I think the dependencies should use use struct A<R: Rng>, from_rng(&mut Rng) and rng_constructor: Fn() -> impl Rng so the end user in the dependency chain can choose the RNG.

@dhardy
Copy link
Member Author

dhardy commented May 23, 2018

struct A<R: Rng> and -> impl Rng both require compile-time type definition. The current thread_rng() -> ThreadRng uses a wrapper type so changing the implementation within this type is equivalent. We would use similar wrapper types for any new thread-local generators (and I think it only makes sense to use thread-local).

You can usually make RNGs faster by increasing the size of the state, because it allows for more instruction parallelism.

This works for generating random bytes faster, but not necessarily for getting random data into other code faster. As noted in the doc, it's necessary to benchmark your code to find the fastest RNG, so it's pointless trying to find "the fastest RNG" for thread_rng.

@fschutt
Copy link

fschutt commented May 23, 2018

A question:

In the latest release (rand 0.5.0), thread_rng doesn't seem to be included when I specify no-default-features (it worked before). Since compile time is still very much an issue, it would be nice to document which feature I have to enable to use thread_rng (to generate a random string).

@dhardy
Copy link
Member Author

dhardy commented May 23, 2018

It requires std, which is the only thing disabled when you use no-default-features. No way around that for now, and even if there was it wouldn't help much with compile time. Looks like this is documented in the readme to me.

@pitdicker
Copy link
Contributor

You can usually make RNGs faster by increasing the size of the state, because it allows for more instruction parallelism.

There are not many RNGs for which this is true, but does help a bit for Xorshift and Xoroshiro. It does come at the cost of using more registers, so it may work better in benchmarks than in real code.

@vks
Copy link
Collaborator

vks commented May 23, 2018 via email

@dhardy
Copy link
Member Author

dhardy commented Jun 8, 2018

For the situation with many worker threads, isn't it better to use the scheme I ended up with of splitting an RNG using a wrapper such as SplittableRng (#399 (comment))? Also one lesson to take away from that thread was, that the overhead of TLS really becomes a big part of the performance of a fast PRNG such as Xorshift.

Is this really about the overhead of TLS or about checking that the RNG was initialised? Because any call to thread_rng() must check it was initialised.

@pitdicker your ParallelIterator code uses rng: R::from_rng(&mut self.rng).unwrap() which does a full initialisation of the new RNG — when using a PRNG like Hc128Rng this is slow. Of course switching stream/jumping would be faster but potentially less secure (must be careful not to re-use streams and avoid overlap).

More important though is that any approach using splitting won't work with the current API, since new threads would need a "master" to initialise from. I don't know if std::thread::spawn could be adapted to initialise the new thread's RNG or something, though even if it could it might not be desirable (since then threads not using an RNG still pay the cost of initialising one). Hence this technique could be used locally in a ParallelIterator or similar but I'm not sure how it could be used to initialise a thread-local RNG.

Renaming thread_rng will be a very big breaking change

This is partially intentional to force users to choose between strong_rng and fast_rng. On the other hand I'm not really happy with the names strong_rng, secure_rng or crypto_rng because they imply too much (see next point).

Would you really recommend thread_rng for generating keys?

Not in its current form, no. With extra protections (also using RDRAND or with forward secrecy), perhaps. But I don't think either of these can be added without a big performance decrease, so if we did add something like this it would be distinct from thread_rng (at least, it would be accessed in a different way).

I was also thinking of vectorization.

I don't think this has any extra requirements here. @pitdicker has been experimenting with using fill_bytes for vectorization; alternatively we could add methods like next_u128 etc. though that's off topic here; suffice it to say I don't see any reason we can't optimise thread_rng for SIMD if we choose to.


After reviewing this again I only really see these options:

  1. Keep the thread_rng API roughly as it is now (but still allowing Rename thread_rng to ThreadRng::new()? #404 if we want that)
  2. Above, but make thread_rng configurable, but only with CSPRNG implementations (e.g. allow ChaCha for less memory but slower).
  3. Above, but also add fast_rng. fast_rng may be predictable. It might be optimised for low-memory requirements and serial usage or might be optimised more for SIMD, or it might simply use thread_rng internally; this is up to us (and might be configurable).

Any further thoughts? I like the idea of the latter but it does make Rand more complex.

@pitdicker
Copy link
Contributor

Is this really about the overhead of TLS or about checking that the RNG was initialised? Because any call to thread_rng() must check it was initialised.

The details of how thread-local storage works are still a bit fuzzy for me, and the implementation in the standard library is spread over quite a few files. Checking if it is initialized is one part. There may also be some indirection, because it has to cross a crate boundary. And at some point (on Unix) it uses libc::pthread_getspecific(key) to get the key from TLS. I suppose that is responsible for looking up the pointer that belongs to a key.

@pitdicker your ParallelIterator code uses rng: R::from_rng(&mut self.rng).unwrap() which does a full initialisation of the new RNG — when using a PRNG like Hc128Rng this is slow.

I ended up with a better scheme using splitting. But I only mentioned it in the context of many worker threads using a fast RNG.

  1. Above, but also add fast_rng. fast_rng may be predictable. It might be optimised for low-memory requirements and serial usage or might be optimised more for SIMD, or it might simply use thread_rng internally; this is up to us (and might be configurable).

The main problem is that some sort of fast_rng that works with TLS will only be fast if you can cache it. I suppose there can be a use for it, but it is not easy to use in a way that is really faster than thread_rng().

I am afraid it can be confusing/disappointing for users. If you don't know how things work, the current thread_rng() can be sort of magical: you can get a random number seemingly out of nowhere. Or get the other Rng methods.

But to use fast_rng() right, you have to realize you are getting a PRNG. In a way it is the same as cheaply initializing a PRNG. From that point on you have to handle the state yourself and pass it around, if you really want to have the benefits of a faster RNG.

When you get to that point, isn't it not just as easy to seed a small PRNG of your own choosing, possibly from thread_rng()?

Not sure what I want to say really 😄. I'm just not sure if we can do something meaningful to offer a fast PRNG in combination with TLS.

  1. Above, but make thread_rng configurable, but only with CSPRNG implementations (e.g. allow ChaCha for less memory but slower).

I am not against this, but also don't really see the problem of using more memory. Do you really think there are situations where we have TLS, but don't have abundant memory?

  1. Keep the thread_rng API roughly as it is now (but still allowing Rename thread_rng to ThreadRng::new()? #404 if we want that)

We are really not in such a bad state in my opinion (although the Rc can go...). #404 does no harm but also offers no advantage in my opinion.

@hdevalence
Copy link

hdevalence commented Jul 11, 2018

Having a single, strong, fast thread_rng seems preferable to having a faster, weaker fast_rng.

Is there any reason not to make the choice of RNG backing the ThreadRng platform-dependent? Then ThreadRng could use AES on CPUs with AESNI (basically every x86 produced this decade) and ChaCha20 when hardware-accelerated AES is unavailable.

@tarcieri
Copy link

I would definitely recommend having a hardware accelerated AES-based RNG on platforms with hardware AES, gated on runtime feature detection. These can be implemented using a simple abstraction: the AES encryption round function alone e.g. aesenc on Intel-family (no need for aeskeygenassist), or AESE on ARMv8+.

There are any number of options for a cipher-based CSPRNG to choose from which are all fully parallelizable/vectorizable. A general theme among these is AES-CTR with a periodic rekeying mechanism (see also RDRAND). Here's a specific, recent example of such an RNG:

https://blog.cr.yp.to/20170723-random.html

Regarding RDRAND specifically, I think it's perfectly fine for the case of e.g. seeding SipHash to mitigate hashDoS, but probably not the best option for some sort of general-purpose CryptoRng.

I would definitely recommend ChaCha as a pure software option which should work everywhere, with ChaCha20 as a paranoid default or it could arguably be reduced to ChaCha12 (the best known attack on ChaCha only works up to 7 rounds and 20-rounds are a paranoid safety margin). ChaCha can be trivially accelerated using e.g. AVX2-style instructions or other vector unit primitives which are ubiquitously available (add, rotate, xor) and has a small code size.

I'm certainly not wild about things like HC-128 or ISAAC. The former saw some analysis via ESTREAM, but ChaCha20 (and its predecessor Salsa20) have seen considerably more as ChaCha20 is an MTI cipher for TLS. It definitely sounds like there's a bit of a cipher zoo going on, and I'd strongly suggest reducing the number of ciphers unless there are very strong technical arguments for doing otherwise.

@vks
Copy link
Collaborator

vks commented Jul 12, 2018

FWIW, I ported a C implementation of DJB's suggestion to Rust: https://github.com/vks/aesrng

@dhardy
Copy link
Member Author

dhardy commented Jul 12, 2018

ChaCha20 (and even ChaCha8) is a lot slower than HC-128, which is why we went with that option. We considered HC-128 acceptable due to the ESTREAM recommendation and decided to remove ISAAC from Rand proper (hasn't happened yet but will).

Making use of hardware AES in thread_rng where available sounds reasonable, though only really useful if either it's significantly faster than HC-128 or there are security concerns regarding HC-128. I guess also preferring better reviewed algorithms where they are fast is also reasonable.

I don't plan to work on this myself but PRs are welcome.

@tarcieri
Copy link

ChaCha20 (and even ChaCha8) is a lot slower than HC-128

That's a bit surprising to hear, considering chacha8 and chacha12 are consistently faster than hc128 across multiple architectures on SUPERCOP (with chacha20 often beating it out as well):

https://bench.cr.yp.to/results-stream.html

(that said, ChaCha8 is pretty much zero margin for error, and I wouldn't recommend it as it has no safety margin. ChaCha12 is a happy medium between that and ChaCha20's paranoia)

@vks
Copy link
Collaborator

vks commented Jul 12, 2018

That's a bit surprising to hear, considering chacha8 and chacha12 are consistently faster than hc128 across multiple architectures on SUPERCOP (with chacha20 often beating it out as well)

Maybe our implementation needs to be optimized/vectorized.

@dhardy
Copy link
Member Author

dhardy commented Jul 12, 2018

If ChaCha can be optimised to compete that would be great. It uses a lot less memory and initialisation time (I guess this is why those benches show HC-128 as terrible on short sequences).

@pitdicker
Copy link
Contributor

pitdicker commented Jul 12, 2018

That's a bit surprising to hear, considering chacha8 and chacha12 are consistently faster than hc128 across multiple architectures on SUPERCOP (with chacha20 often beating it out as well)

I have wondered about those benchmarks before. Some of those benchmarks of HC-128 are even off by two orders of magnitude! Maybe it is also counting initialization time, combined with a terrible implementation? And using an implementation of ChaCha that is much faster than anything there is in Rust at the moment?

@vks
Copy link
Collaborator

vks commented Jul 12, 2018

I also tried the chacha crate, and it seems to be faster than our implementation, but still 3 times slower than HC-128.

@tarcieri
Copy link

There isn't a particularly good implementation of ChaCha in Rust right now that I know of (which is why I plan on writing one soon).

@TheIronBorn
Copy link
Collaborator

TheIronBorn commented Jul 12, 2018

I wrote an explicitly vectorized ChaCha4 implementation for my simd_prngs repo but it's still very slow.

@tarcieri
Copy link

I guess I missed the previous discussion of Randen, but it looks like a very nice option for platforms where hardware AES is available:

@dhardy
Copy link
Member Author

dhardy commented Jul 30, 2018

Yes [you did]: #462

Edit to clarify: Randen looks like a good RNG, but I'd want to see third-party cryptographic review before promoting it here.

@dhardy dhardy added P-medium D-changes Do: changes requested and removed E-question Participation: opinions wanted labels Aug 23, 2018
@dhardy dhardy added this to the 0.6 release milestone Aug 23, 2018
@dhardy dhardy modified the milestones: 0.6 release, 0.7 release Mar 1, 2019
@vks
Copy link
Collaborator

vks commented Jun 5, 2019

@dhardy I think this can be closed, now that we use a SIMD-optimized implementation of ChaCha?

@dhardy
Copy link
Member Author

dhardy commented Jun 5, 2019

I guess. We still haven't examined Randen or forward secrecy, but these are separate topics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
D-changes Do: changes requested
Projects
None yet
Development

No branches or pull requests

8 participants