-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative way to detect AMD bug #48
Conversation
So in practical terms, this change means that anyone with an affected CPU cannot use the RDRAND implementation at all. Is the presence of a sporadic bug (on resume from standby) sufficient justification to not trust RDRAND on those CPUs at all, while still trusting it on other CPUs? @tarcieri thoughts? I haven't read up on the issue, but this seems unnecessary. Question: is there a reason we can't discard "obviously bad" values and try again? There are three aspects to this question:
There is also the question: is it likely other (possibly future) CPUs might exhibit a similar bug? My view is that the most robust solution would be to trap the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think disabling RDRAND completely on affected CPUs is too extreme of a measure. Especially considering that failure can be reliably detected.
Personally I like approach of detecting CPU family and doing value check only for affected CPUs. I think we can write it like this (note the changed rdrand
signature). But I understand the complexity concern, so @dhardy's proposal is a viable alternative.
BTW maybe we should use a dedicated error code for this problem instead of Error::UNKNOWN
?
I updated this PR and the description to incorporate the above suggestions
I agree. Checking the output value and trying again seems avoid false positives and being overly aggressive.
The currently version has the check in place for all targets. This bug could be present in the future and performing the check is very cheep (just a
This is a good point. Intel's own documentation says that 0x00000000 is returned when RDRAND fails. Now they set the CF flag correctly. However, failing to set a CF flag is a relatively easy failure mode, so I think keeping this check for all implementations is reasonable. |
This is an alternative way to do the AMD bug detection "fixed" in #43.
@newpavlov feel free to immediately close this if you think this way is overly complex.
Before we checked the return value of RDRAND to detect the bug. However, this will occasionally return a false positive (every (1/2)^63 invocations), so we use the existing retry loop to try again. I also added comments explaining that: