Alternative way to detect AMD bug #48

josephlr · 2019-06-29T01:54:36Z

This is an alternative way to do the AMD bug detection "fixed" in #43.

@newpavlov feel free to immediately close this if you think this way is overly complex.

Before we checked the return value of RDRAND to detect the bug. However, this will occasionally return a false positive (every (1/2)^63 invocations), so we use the existing retry loop to try again. I also added comments explaining that:

The issue here is AMD not setting the CF flag properly (as RDRAND is allowed to fail)
Why we perform this check on all targets.

dhardy · 2019-06-29T10:52:18Z

So in practical terms, this change means that anyone with an affected CPU cannot use the RDRAND implementation at all. Is the presence of a sporadic bug (on resume from standby) sufficient justification to not trust RDRAND on those CPUs at all, while still trusting it on other CPUs? @tarcieri thoughts? I haven't read up on the issue, but this seems unnecessary.

Question: is there a reason we can't discard "obviously bad" values and try again? There are three aspects to this question:

Can the CPU "recover" and return other values? Edit: no
In this case, could the occurrence of the bug reduce security of the result?
What is the expected number of false-positives of the guard removed by this PR? (2^-63 times the number of uses of RDRAND via this lib — so unless servers actually make millions of requests the number of false positives is likely to be 0 or very close.)

There is also the question: is it likely other (possibly future) CPUs might exhibit a similar bug?

My view is that the most robust solution would be to trap the !0 value, but retry a couple of times (we're already in a loop, so continue is sufficient).

newpavlov

I think disabling RDRAND completely on affected CPUs is too extreme of a measure. Especially considering that failure can be reliably detected.

Personally I like approach of detecting CPU family and doing value check only for affected CPUs. I think we can write it like this (note the changed rdrand signature). But I understand the complexity concern, so @dhardy's proposal is a viable alternative.

BTW maybe we should use a dedicated error code for this problem instead of Error::UNKNOWN?

src/rdrand.rs

josephlr · 2019-06-29T22:16:11Z

I updated this PR and the description to incorporate the above suggestions

I think disabling RDRAND completely on affected CPUs is too extreme of a measure. Especially considering that failure can be reliably detected.

I agree. Checking the output value and trying again seems avoid false positives and being overly aggressive.

Personally I like approach of detecting CPU family and doing value check only for affected CPUs. I think we can write it like this (note the changed rdrand signature). But I understand the complexity concern, so @dhardy's proposal is a viable alternative.

The currently version has the check in place for all targets. This bug could be present in the future and performing the check is very cheep (just a lea and cmp for both checks)

There is also the question: is it likely other (possibly future) CPUs might exhibit a similar bug?

This is a good point. Intel's own documentation says that 0x00000000 is returned when RDRAND fails. Now they set the CF flag correctly. However, failing to set a CF flag is a relatively easy failure mode, so I think keeping this check for all implementations is reasonable.

josephlr mentioned this pull request Jun 29, 2019

Prepare release v0.1.5 #47

Merged

newpavlov reviewed Jun 29, 2019

View reviewed changes

src/rdrand.rs Outdated Show resolved Hide resolved

Alternative way to detect AMD bug

152c773

josephlr force-pushed the amd branch from abc85fb to 152c773 Compare June 29, 2019 21:58

newpavlov reviewed Jun 29, 2019

View reviewed changes

src/rdrand.rs Show resolved Hide resolved

Typo

7e5a3da

newpavlov approved these changes Jun 29, 2019

View reviewed changes

newpavlov merged commit cbc44ee into rust-random:master Jun 30, 2019

josephlr deleted the amd branch July 1, 2019 07:32

josephlr mentioned this pull request Jul 8, 2019

Improve Error handling #54

Merged

dhardy mentioned this pull request Jul 26, 2019

Check for buggy AMD hardware nagisa/rust_rdrand#12

Closed

briansmith mentioned this pull request Sep 30, 2021

RDRAND-based output is (too) biased #228

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative way to detect AMD bug #48

Alternative way to detect AMD bug #48

josephlr commented Jun 29, 2019 •

edited

Loading

dhardy commented Jun 29, 2019 •

edited

Loading

newpavlov left a comment •

edited

Loading

josephlr commented Jun 29, 2019

Alternative way to detect AMD bug #48

Alternative way to detect AMD bug #48

Conversation

josephlr commented Jun 29, 2019 • edited Loading

dhardy commented Jun 29, 2019 • edited Loading

newpavlov left a comment • edited Loading

Choose a reason for hiding this comment

josephlr commented Jun 29, 2019

josephlr commented Jun 29, 2019 •

edited

Loading

dhardy commented Jun 29, 2019 •

edited

Loading

newpavlov left a comment •

edited

Loading