-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add next_bool method to RngCore and counter levels to BlockRng #1031
Conversation
Hm, unfortunately I see a measurable performance regression... PR branch:
Master branch:
I will try to look into this, but preliminary results are not great. |
@@ -258,8 +254,7 @@ macro_rules! chacha_impl { | |||
|
|||
impl PartialEq<$ChaChaXRng> for $ChaChaXRng { | |||
fn eq(&self, rhs: &$ChaChaXRng) -> bool { | |||
self.rng.core.state.stream64_eq(&rhs.rng.core.state) | |||
&& self.get_word_pos() == rhs.get_word_pos() | |||
self.rng.eq(&rhs.rng) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch @vks.
@newpavlov There is a reason BlockRng doesn't derive PartialEq. (This probably deserves a comment in the source, because it's subtle.) Two BlockRngs can be logically equivalent without being bitwise equivalent if: their buffers correspond to different positions in the stream (equivalently: their underlying RNGs have different block counters), and the indexes into their buffers are at different positions, and these differences offset each other. This situation happens with Seekable RNGs like ChaCha.
Another option might be to have a bit counter for |
Then these bits may be consumed out-of-order with other random values — perhaps acceptable, but more difficult to document the rules around reproducibility. There is some potential advantage here, but I'm not convinced it's a good trade with complexity. |
I guess we could preemptively advance the index if at least one bit was used. We will trade-off additional work in
I think this approach significantly simplifies code, especially the |
The index level approach demonstrates a better performance, but degradation for HC128:
ChaCha8:
ChaCha20:
Performance can be improved a bit further by using the unstable
UPD: One bench iteration includes 1000 calls to |
20afad9
to
601038d
Compare
9750954
to
1411a1c
Compare
We already have a I'm not sure what to say about this approach. IIUC it doesn't affect non-block PRNGs much at all, so for example someone using |
Yes, it only changes behavior of types built on top of BTW why do we define a newtype wrapper for ChaCha and HC-128 instead of using a simple type alias? |
IIRC it was simply to avoid exposing the implementation and thus avoiding breaking changes if we changed it. Any idea why |
But
No idea at the moment. After re-running benchmarks I get ~1.4 ns overhead for Emulating Could be quirks of branch prediction?
I can't reproduce this result. Are you sure you have removed it correctly? |
If we have a uniform API for it — but we don't (beyond the three ChaCha generators).
It was a hack, leaving |
Every stream cipher (which are often seekable) can be used as an RNG and we have the
Hm, I still can't reproduce it. Maybe you forgot to remove some One hypothesis why we see such difference between If the current overhead is acceptable to you, I will fix the remaining issues ( |
This is IMO quite a big perf hit ( |
An alternative would be to use the bit counter for Another alternative is to provide a second |
To keep generated bits in-order (rather than a separate "bit buffer") requires checking the bit counter anyway, and I don't like the idea of losing this constraint.
Too much complexity IMO. (I mean if users really want their own optimal block RNG for their use-case, there's no reason they can't do that either way. But that doesn't belong in |
Wow, this PR is now a year old! It also conflicts with several other recent changes, and we never did solve that perf. regression. @newpavlov do you still think there is significant merit in pursuing this or shall we abandon it? |
I think we can close this PR. I still think this approach is worth exploring, but I guess it will be better to start fresh than to update this PR. |
This approach significantly simplifies the code, adds effective bit generation (see #1014) and should have minor impact on performance (although I haven't measured it yet).