-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement JitterRng
, based on jitterentropy-library
.
#28
Conversation
Nice work. Think I'll leave the review for tomorrow. I was going to suggest you make a PR directly against upstream |
I have had email contact with the author of |
This is a pretty direct translation from C to Rust. Some notes per function: ### `random_loop_cnt` (`jent_loop_shuffle`) Because the `min` argument was always `0`, I removed that argument. The C code did not seem to fold the time optimally. When `bits` is set to `7`, as `mem_access` does, it will fold `64 / 7 = 9` times, leaving 1 bit unused. It is minor, but we use `folds = (64 + n_bits - 1) / n_bits;`, so all is used. We do not add 1 to the resulting loop count, this should be done in the calling code. `memaccess` already adds 128 anyway. For `lfsr_time` we use the loop count on a `throw_away` value, and then run the real calculation once. ### `lfsr_time` (`jent_lfsr_time`) We do not allow overriding `loop_cnt`, and also do not return the actual `loop_cnt` used. This only had some use in testing the C code, but was 'not needed during runtime'. Only the last round of the outer loop (in C) effect `self.data`. In Rust the inner loop is part of the `lfsr` helper function. All but the last round operate on a `throw_away` function, that does not get reset between the loop rounds. ### `memaccess` (`jent_memaccess`) We do not allow overriding `loop_cnt`, and also do not return the actual `loop_cnt` used. This only had some use in testing the C code, but was 'not needed during runtime'. We do not do NULL pointer checks, and running `JitterRng` without the Memory Access noise source is (currently) not supported. We (currently) do not support changing `memblocksize` and `memblocks` except by changing the constants `MEMORY_BLOCKS` and `MEMORY_BLOCKSIZE`. So instead of recalculating `wrap`, we can just re-use MEMORY_SIZE. Instead of a `memlocation` pointer we use indexing. The `index` is calculated before accessing `self.mem[index] instead of after. This convinces the compiler indexing is safe, and eliminates bounds checks. ### `stuck` (`jent_stuck`) For all delta's we use an `i64`, instead of an `u64` for the first delta in the C implementation. This handles a clock that may not be entirely monotonic (for example due to NTP) slightly better. Also, we return a `bool` instead of an `u64`. ### `measure_jitter` (`jent_measure_jitter`) It seemed clearer here to not return an `u64` or `bool`, but `Option<()>`. `Some` and `None` indicate clearly whether we have been able to add some entropy. For `current_delta` we use an `i64` instead of an `u64`. It is cast back to an `u64` for `lfsr_time`, which only cares about bits. ### `stir_pool` (`jent_stir_pool`) The C code does something difficult with initializing an `u64` with two `u32`'s in an `union`. The numbers it uses for initialization are from SHA-1, and the order does not really matter. The Rust code just sets the `u64`'s directly, and uses the constants in the order they appear in FIPS 180-4 section 5.3.1. The function tries to be constant time to prevent leaking timing information about the generated random number. Using a `trow_away` value like it does is optimised out, and I don't trust branches to be constant time. I used a bit mask trick instead, and verified the assembly does not include anything conditional. Not sure it matters anything, we just went through a lot of effort to have as much timing variance as possible to generate the random number. ### `gen_entropy` (`jent_gen_entropy`) We do not support oversampling, so no need to repeat the loop more times. `self.memaccess()` in `measure_jitter` is easily optimised out, because LLVM recognises we never read the results. Therefore we do a single read from `self.mem`, hidden to the optimizer with `black_box()`. We return `self.data` instead of requiring the calling code to read from it. ### (not included) `jent_read_entropy` Here we have the convienent `fill_bytes_via_u64` for. The C code calls `jent_gen_entropy` one last time after filling the buffer, to make sure that something reading the processes memory can not read the last generated random number. 'This call reduces the speed of the RNG by up to half`. It seems to me setting it to 0 is just as good. I did not bother with this. As an alternative a user caring very much about this can just call `next_u64` after receiving a result. ### `entropy_init` (`jent_entropy_init`) Wrap `lfsr_time` in the loop in a `black_box` to make sure it is not optimised out. For delta we use an `i64` instead of an `u64`. Instead of `lowdelta` we just use `delta`. It seems a hack to compile on some 32-bit architectures.
> One question about the license. All libraries in the Rust ecosystem try > to be dual-licensed with the permissive Apache 2.0 license and the MIT > license. I am not sure it is ok for me to do so, because your code used > the 3-clause BSD. The difference seems to be in the third clause: "The > name of the author may not be used to endorse or promote products > derived from this software without specific prior written permission." > Can you maybe give permission to license this Rust translation under the > MIT license? Granted. I am fine with the mentioned license as long as there is a reference to my code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it's a complex beast, right?
I guess I sort of understand why all the bits are there but it still seems really complex for what we need.
The other question is if both OsRng
and JitterRng
fail, what do we do? Some users would be happy with insecure seeding, some not.
|
||
/// Error kind which can be matched over. | ||
#[derive(PartialEq, Eq, Debug, Copy, Clone)] | ||
pub enum ErrorKind { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Kind" is used for fine classification here, not coarse (like rand_core::ErrorKind
), and there's no point matching over the kind because in all cases the result is the same: the RNG is unavailable.
I think you should just rename ErrorKind
→ Error
(or JitterError
) and get rid of the current Error
wrapper. Or even get rid of both error types and just use a static string for each error. A lot of extra code just to prettify error-cases doesn't seem worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I am a bit tired of the long discussion around error handling. Not to say that finding a optimal solution isn't good though...
The custom error kind may make a bit more sense now that we expose a test_timer
function. But I will be happy to make changes here once the changes to our error handling have landed. As you say it is all not that critical.
src/jitter_rng.rs
Outdated
ErrorKind::TinyVariantions => | ||
write!(f, "Variations of deltas of time too small."), | ||
ErrorKind::ToManyStuck => | ||
write!(f, "To many stuck results (indicating no added entropy)."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see much point writing out all the details twice (description
and fmt
). Also, this means your no_std
version has a shorter description.
Also: "too" not "to"
src/jitter_rng.rs
Outdated
// The correct way to calculate the current time is | ||
// `dur.as_secs() * 1_000_000_000 + dur.subsec_nanos() as u64` | ||
// But this is faster, and the difference in terms of entropy is negligible. | ||
dur.as_secs() << 32 | dur.subsec_nanos() as u64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to shift, why not shift by 30? You could probably even forget the seconds part altogether (the wrapping would cause occasional large negative values in current_delta
, which means the high bits in the input to lsfr_time
will be set, but I don't think that matters). But then I don't see why the NotMonotonic
error case exists either?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was not sure whether to use 30 or 32. It does not really mater anything. Jitterentropy
uses 32, but 30 seems neater to me.
I had some trouble with large negative values in current_delta
, they could cause an integer overflow in the stuck
test, and give unreliable results. Thanks to that I found out new_with_timer
was not setting the initial values right by calling gen_entropy()
once.
But then I don't see why the NotMonotonic error case exists either?
We have code to handle the clock going back occasionally, so it seems a bit odd to me too. But these tests have proven themselves to weed out bad timers, or functions that are not really timers at all, but more like counters. So I'd like to keep following the reference implementation here.
src/jitter_rng.rs
Outdated
// Get time stamp and calculate time delta to previous | ||
// invocation to measure the timing variations | ||
let time = (self.timer)(); | ||
let current_delta = time.wrapping_sub(self.prev_time) as i64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth adding a note that wrapping arithmetic on unsigned integers followed by cast to i64 generates correct deltas in most cases (including where both times have the highest bit set and where one has wrapped). [The exception is where the actual delta is not representable as an i64
, but this shouldn't be the case.]
src/jitter_rng.rs
Outdated
// we rely on it to not recognise the opportunity. | ||
for i in 0..64 { | ||
let apply = (self.data >> i) & 1; | ||
let mask = !((apply as i64) - 1) as u64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just !apply.wrapping_sub(1)
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better, thank you!
src/jitter_rng.rs
Outdated
|
||
// Variations of deltas of time must on average be larger than 1 to | ||
// ensure the entropy estimation implied with 1 is preserved | ||
if delta_sum <= 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On average? So should the RHS be TESTLOOPCOUNT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good catch.
src/jitter_rng.rs
Outdated
// Note: in the reference implementation only the last round effects | ||
// `self.data`, all the other results are ignored. To make sure the | ||
// other rounds are not optimised out, we first run all but the last | ||
// round on a trow-away value instead of the real `self.data`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw-away
Thanks for the good comments. I have been working on it some more, and some of the changes are the same as you commented on 👍. Not had much time last week to join the discussions, but I'll try to push something more here soon. |
Add the function `timer_stats` (`jent_lfsr_var_stat` in C). Like in other the other functions we use a `int64` to represent a time delta instead of a `u64`. Instead of the `min` variable to indicate how many times the noice source loops should run at least, we use a `var_rounds` boolean. Setting `min` seems like an historic leftover, and does not fit `memaccess`. A simple bool covers everything needed for testing. This effects `timer_stats`, `lfsr_time` and `memaccess`. It is useful to be able to run the statistics function, even when initializing `JitterRng` might fail because of the `test_timer` function. Therefore `new_with_timer` does not automatically test the timer jitter, and expects the user code to do so. Now that `new_with_timer` calls `gen_entropy`, it was possible to move `black_box(ec.mem[0])` here instead of `measure_jitter`, so it is executed even less. # Conflicts: # src/jitter_rng.rs
It is both complex and not all that much so. It comes down to:
Compared to
Do we have many options left? Actually I think you need to run on a pretty misconfigured system with exotic hardware to see both fail. At some point we have to say it is good enough. It seems to me a panic is ok then. Maybe fun anecdote: in school they learned us about all the safety factors to apply on for example the construction of houses. "We don't design for an airplane engine falling on the roof". But we do design for the worst storm that has 5% chance of occurring in the next 50 years, and then make it 50% stronger. |
One idea we could try is to only run And I have not yet exposed a way to manually set the number of entropy collection rounds that should be run. Not sure what would be the best place for it yet. |
src/jitter_rng.rs
Outdated
@@ -361,6 +361,9 @@ impl JitterRng { | |||
// Get time stamp and calculate time delta to previous | |||
// invocation to measure the timing variations | |||
let time = (self.timer)(); | |||
// Note: wrapping_sub combined with a cast to `i64` generates a correct | |||
// delta, even in the unlikely case this is a timer than is not strictly | |||
// monotonic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a timer that is not strictly ... ?
Oh, for some reason I missed your recent comments. You can use |
Just like the code in |
Policy? No. I guess it's not a big deal to depend on an extra small, commonly used crate like Regarding your other point: |
I personally would expect crates like |
http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html#toc-Appendix-F has a table with the results for 250 processors. Failures:
The very old Celeron is the one we could worry about in combination with Windows. I think these results do not include the Memory Access noise source, maybe they do work now for us.
It was just a quote for fun 😄. But it is good to ask what the chances are for both |
Your code claims the memory access only reliably uses the L2 cache IIRC so it still needs a high-resolution timer. But hopefully this is good enough. |
benches/generators.rs
Outdated
macro_rules! init_gen { | ||
($fnn:ident, $gen:ident) => { | ||
#[bench] | ||
fn $fnn(b: &mut Bencher) { | ||
let mut rng = XorShiftRng::new().unwrap(); | ||
let mut rng = OsRng::new().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this uses XorShiftRng
because we want to benchmark from_rng
, not the OsRng
itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my first intention with this benchmark. Somewhere along the road it seemed to make sense to do the complete measurement. Ah, yes, to compare JitterRng against. I don't think it matters all that much though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense specifically for JitterRng
but not for the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can reverse it, together with fixes for the CI failures
7f3c9b0
to
14136e7
Compare
After some debugging, I find that FACTOR within But, is there a good reason not to use floating-point arithmetic here? This is exactly the kind of problem FP works well on (good precision without having to worry about overflow). I don't think speed matters so I think the only issue is whether any platforms don't have hardware or software FP? I guess software FP may need to be avoided if code size is an issue. So really this comes down to: are we targetting <32-bit hardware here? BTW @pitdicker please pull from my branch for updates / merge |
@pitdicker I thought this was meant to be slow? Is something VERY wrong here? test gen_u64_chacha ... bench: 15,494 ns/iter (+/- 362) = 516 MB/s
test gen_u64_isaac ... bench: 7,625 ns/iter (+/- 74) = 1049 MB/s
test gen_u64_isaac64 ... bench: 4,461 ns/iter (+/- 30) = 1793 MB/s
test gen_u64_jitter ... bench: 1,888 ns/iter (+/- 37) = 4237 MB/s
test gen_u64_os ... bench: 318,028 ns/iter (+/- 2,905) = 25 MB/s
test gen_u64_std ... bench: 4,510 ns/iter (+/- 120) = 1773 MB/s
test gen_u64_xorshift ... bench: 2,803 ns/iter (+/- 73) = 2854 MB/s |
I made a |
That sound very wrong... could you try 49e9bb0? |
I thought it useful for |
79e9bb0 is slower and passes the first few rounds of PractRand — will investigate |
Ok, the bug was my fault — I refactored some division code carelessly, resulting in 0 rounds. Now JitterRng uses 8 or 12 rounds to generate a Edit: JitterRng outperforms OsRng on test gen_bytes_chacha ... bench: 447,086 ns/iter (+/- 11,865) = 572 MB/s
test gen_bytes_isaac ... bench: 189,877 ns/iter (+/- 4,032) = 1348 MB/s
test gen_bytes_isaac64 ... bench: 105,983 ns/iter (+/- 1,137) = 2415 MB/s
test gen_bytes_jitter ... bench: 749,747,356 ns/iter (+/- 6,627,870)
test gen_bytes_os ... bench: 1,336,217 ns/iter (+/- 51,917) = 191 MB/s
test gen_bytes_std ... bench: 106,992 ns/iter (+/- 15,684) = 2392 MB/s
test gen_bytes_xorshift ... bench: 151,903 ns/iter (+/- 7,025) = 1685 MB/s |
Yes, What should I do to make this landable? Use your commit, and fix the overflow? |
I "fixed" the overflow by reducing FACTOR to 3 in the end. "Works for me" at least (4 also did most of the time). My machine had delta-average of around 30-40'000 which is on the limit of overflow when raised to the fourth power, but with power 3 over 20 times that average is representable, so I don't think we need to go lower. Didn't look too much at the precision but I don't think it's a problem (even with δ of 10'000, the log is about 40; not too small). I made a few other changes in my branch; if you're happy we can merge. |
Your branch looks good, feel free to merge! With division bug you mean |
Sorry for not finishing it myself, I always wait with 'difficult' things until I have a couple of continues hours to work on it... |
I can understand that; no problem. I'm not so satisfied with progress on the whole rand thing myself, but nevermind. No, I agree that Oh, maybe you mean the other bug — that was to do with the same division (the multiplication by 64 must be part of the same statement). |
I tried 4 ways of improving the precision of that part, and finally went with the easiest: a power and division. But yes, messed it up. Thank you :-) |
Uh sorry, I thought it was your code; you only messed up the FACTOR. Anyway, my branch has some other stuff in it (clock_rng and cat_rng); I'll clean up and merge soon. |
This is a pretty direct translation from C to Rust. It should be possible to read
jitterentropy-base.c
andjitter_rng
side-by-side to review.Some notes per function:
random_loop_cnt
(jent_loop_shuffle
)Because the
min
argument was always0
, I removed that argument.The C code did not seem to fold the time optimally. When
bits
is set to7
, asmem_access
does, it will fold64 / 7 = 9
times, leaving 1 bit unused. It is minor, but we usefolds = (64 + n_bits - 1) / n_bits;
, so all is used.We do not add 1 to the resulting loop count, this should be done in the calling code.
memaccess
already adds 128 anyway. Forlfsr_time
we use the loop count on athrow_away
value, and then run the real calculation once.lfsr_time
(jent_lfsr_time
)We do not allow overriding
loop_cnt
, and also do not return the actualloop_cnt
used. This only had some use in testing the C code, but was 'not needed during runtime'.Only the last round of the outer loop (in C) effect
self.data
. In Rust the inner loop is part of thelfsr
helper function. All but the last round operate on athrow_away
function, that does not get reset between the loop rounds.memaccess
(jent_memaccess
)We do not allow overriding
loop_cnt
, and also do not return the actualloop_cnt
used. This only had some use in testing the C code, but was 'not needed during runtime'.We do not do NULL pointer checks, and running
JitterRng
without the Memory Access noise source is (currently) not supported.We (currently) do not support changing
memblocksize
andmemblocks
except by changing the constantsMEMORY_BLOCKS
andMEMORY_BLOCKSIZE
. So instead of recalculatingwrap
, we can just re-use MEMORY_SIZE.Instead of a
memlocation
pointer we use indexing. Theindex
is calculated before accessing `self.mem[index] instead of after. This convinces the compiler indexing is safe, and eliminates bounds checks.stuck
(jent_stuck
)For all delta's we use an
i64
, instead of anu64
for the first delta in the C implementation. This handles a clock that may not be entirely monotonic (for example due to NTP) slightly better.Also, we return a
bool
instead of anu64
.measure_jitter
(jent_measure_jitter
)It seemed clearer here to not return an
u64
orbool
, butOption<()>
.Some
andNone
indicate clearly whether we have been able to add some entropy.For
current_delta
we use ani64
instead of anu64
. It is cast back to anu64
forlfsr_time
, which only cares about bits.stir_pool
(jent_stir_pool
)The C code does something difficult with initializing an
u64
with twou32
's in anunion
. The numbers it uses for initialization are from SHA-1, and the order does not really matter. The Rust code just sets theu64
's directly, and uses the constants in the order they appear in FIPS 180-4 section 5.3.1.The function tries to be constant time to prevent leaking timing information about the generated random number. Using a
trow_away
value like it does is optimised out, and I don't trust branches to be constant time. I used a bit mask trick instead, and verified the assembly does not include anything conditional. Not sure it matters anything, we just went through a lot of effort to have as much timing variance as possible to generate the random number.gen_entropy
(jent_gen_entropy
)We do not support oversampling, so no need to repeat the loop more times.
self.memaccess()
inmeasure_jitter
is easily optimised out, because LLVM recognises we never read the results. Therefore we do a single read fromself.mem
, hidden to the optimizer withblack_box()
.We return
self.data
instead of requiring the calling code to read from it.(not included)
jent_read_entropy
Here we have the convienent
fill_bytes_via_u64
for.The C code calls
jent_gen_entropy
one last time after filling the buffer, to make sure that something reading the processes memory can not read the last generated random number. 'This call reduces the speed of the RNG by up to half`. It seems to me setting it to 0 is just as good.I did not bother with this. As an alternative a user caring very much about this can just call
next_u64
after receiving a result.entropy_init
(jent_entropy_init
)Wrap
lfsr_time
in the loop in ablack_box
to make sure it is not optimised out.For delta we use an
i64
instead of anu64
.Instead of
lowdelta
we just usedelta
. It seems a hack to compile on some 32-bit architectures.