-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Firefox crashes on Windows because BCryptGenRandom fails #94098
Comments
Are you sure that IIUC One potential solution on the Firefox side is to try to call Also note that we can not target only Win7 without affecting other Windows versions using only |
Is there not some manual way we could check inside of the cfg block? |
Hi there! Sorry about the radio silence for a while on this issue -- The OP isn't able to continue working on this, and I've been asked to help resolve it. I have added PR #96917 that implements the idea of re-introducing
We are very certain of this, as we don't see equivalent crashes in Firefox in our own use of RtlGenRandom, and I don't see any issue reported for Chrome/Edge in their code We are not convinced that it is an "OS problem", persay... We suspect that a virus scanner or other security software may be interfering with certain machines' ability to load the BCrypt library. We don't even know for certain that this issue only affects Firefox; it could just be that we're the only ones with beefy enough crash reporting to catch it. I agree with you in principle that it's not great to have to introduce workarounds for things like "a virus scanner is breaking Rust code" into the standard library, but if we assume that Firefox is not the only piece of software that has this issue then where else should the change be made? Even if it is only Firefox, we're left with the options of either "seeing through" the abstraction that The PR I have attached should make no difference in the 99.999%+ machines where this is not a problem, but in that small fraction of machines where BCrypt is broken, this only improves the robustness of StdLib.
As it turns out, the name of this issue is perhaps a bit wrong. This issue is seen on multiple versions of Windows (example crash on Windows 10). My PR is agnostic to this anyway -- if |
Thank you for following up on this issue. As you say, we are reliant on Firefox's data here. I've updated the title to better reflect that not just x86 Windows 7 is affected. |
Make HashMap fall back to RtlGenRandom if BCryptGenRandom fails With PR rust-lang#84096, Rust `std::collections::hash_map::RandomState` changed from using `RtlGenRandom()` ([msdn](https://docs.microsoft.com/en-us/windows/win32/api/ntsecapi/nf-ntsecapi-rtlgenrandom)) to `BCryptGenRandom()` ([msdn](https://docs.microsoft.com/en-us/windows/win32/api/bcrypt/nf-bcrypt-bcryptgenrandom)) as its source of secure randomness after much discussion ([here](rust-random/getrandom#65 (comment)), among other places). Unfortunately, after that PR landed, Mozilla Firefox started experiencing fairly-rare crashes during startup while attempting to initialize the `env_logger` crate. ([docs for env_logger](https://docs.rs/env_logger/latest/env_logger/)) The root issue is that on some machines, `BCryptGenRandom()` will fail with an `Access is denied. (os error 5)` error message. ([Bugzilla issue 1754490](https://bugzilla.mozilla.org/show_bug.cgi?id=1754490)) (Discussion in issue rust-lang#94098) Note that this is happening upon startup of Firefox's unsandboxed Main Process, so this behavior is different and separate from previous issues ([like this](https://bugzilla.mozilla.org/show_bug.cgi?id=1746254)) where BCrypt DLLs were blocked by process sandboxing. In the case of sandboxing, we knew we were doing something abnormal and expected that we'd have to resort to abnormal measures to make it work. However, in this case we are in a regular unsandboxed process just trying to initialize `env_logger` and getting a panic. We suspect that this may be caused by a virus scanner or some other security software blocking the loading of the BCrypt DLLs, but we're not completely sure as we haven't been able to replicate locally. It is also possible that Firefox is not the only software affected by this; we just may be one of the pieces of Rust software that has the telemetry and crash reporting necessary to catch it. I have read some of the historical discussion around using `BCryptGenRandom()` in Rust code, and I respect the decision that was made and agree that it was a good course of action, so I'm not trying to open a discussion about a return to `RtlGenRandom()`. Instead, I'd like to suggest that perhaps we use `RtlGenRandom()` as a "fallback RNG" in the case that BCrypt doesn't work. This pull request implements this fallback behavior. I believe this would improve the robustness of this essential data structure within the standard library, and I see only 2 potential drawbacks: 1. Slight added overhead: It should be quite minimal though. The first call to `sys::rand::hashmap_random_keys()` will incur a bit of initialization overhead, and every call after will incur roughly 2 non-atomic global reads and 2 easily predictable branches. Both should be negligible compared to the actual cost of generating secure random numbers 2. `RtlGenRandom()` is deprecated by Microsoft: Technically true, but as mentioned in [this comment on GoLang](golang/go#33542 (comment)), this API is ubiquitous in Windows software and actually removing it would break lots of things. Also, Firefox uses it already in [our C++ code](https://searchfox.org/mozilla-central/rev/5f88c1d6977e03e22d3420d0cdf8ad0113c2eb31/mfbt/RandomNum.cpp#25), and [Chromium uses it in their code as well](https://source.chromium.org/chromium/chromium/src/+/main:base/rand_util_win.cc) (which transitively means that Microsoft uses it in their own web browser, Edge). If there did come a time when Microsoft truly removes this API, it should be easy enough for Rust to simply remove the fallback in the code I've added here
Make HashMap fall back to RtlGenRandom if BCryptGenRandom fails With PR rust-lang#84096, Rust `std::collections::hash_map::RandomState` changed from using `RtlGenRandom()` ([msdn](https://docs.microsoft.com/en-us/windows/win32/api/ntsecapi/nf-ntsecapi-rtlgenrandom)) to `BCryptGenRandom()` ([msdn](https://docs.microsoft.com/en-us/windows/win32/api/bcrypt/nf-bcrypt-bcryptgenrandom)) as its source of secure randomness after much discussion ([here](rust-random/getrandom#65 (comment)), among other places). Unfortunately, after that PR landed, Mozilla Firefox started experiencing fairly-rare crashes during startup while attempting to initialize the `env_logger` crate. ([docs for env_logger](https://docs.rs/env_logger/latest/env_logger/)) The root issue is that on some machines, `BCryptGenRandom()` will fail with an `Access is denied. (os error 5)` error message. ([Bugzilla issue 1754490](https://bugzilla.mozilla.org/show_bug.cgi?id=1754490)) (Discussion in issue rust-lang#94098) Note that this is happening upon startup of Firefox's unsandboxed Main Process, so this behavior is different and separate from previous issues ([like this](https://bugzilla.mozilla.org/show_bug.cgi?id=1746254)) where BCrypt DLLs were blocked by process sandboxing. In the case of sandboxing, we knew we were doing something abnormal and expected that we'd have to resort to abnormal measures to make it work. However, in this case we are in a regular unsandboxed process just trying to initialize `env_logger` and getting a panic. We suspect that this may be caused by a virus scanner or some other security software blocking the loading of the BCrypt DLLs, but we're not completely sure as we haven't been able to replicate locally. It is also possible that Firefox is not the only software affected by this; we just may be one of the pieces of Rust software that has the telemetry and crash reporting necessary to catch it. I have read some of the historical discussion around using `BCryptGenRandom()` in Rust code, and I respect the decision that was made and agree that it was a good course of action, so I'm not trying to open a discussion about a return to `RtlGenRandom()`. Instead, I'd like to suggest that perhaps we use `RtlGenRandom()` as a "fallback RNG" in the case that BCrypt doesn't work. This pull request implements this fallback behavior. I believe this would improve the robustness of this essential data structure within the standard library, and I see only 2 potential drawbacks: 1. Slight added overhead: It should be quite minimal though. The first call to `sys::rand::hashmap_random_keys()` will incur a bit of initialization overhead, and every call after will incur roughly 2 non-atomic global reads and 2 easily predictable branches. Both should be negligible compared to the actual cost of generating secure random numbers 2. `RtlGenRandom()` is deprecated by Microsoft: Technically true, but as mentioned in [this comment on GoLang](golang/go#33542 (comment)), this API is ubiquitous in Windows software and actually removing it would break lots of things. Also, Firefox uses it already in [our C++ code](https://searchfox.org/mozilla-central/rev/5f88c1d6977e03e22d3420d0cdf8ad0113c2eb31/mfbt/RandomNum.cpp#25), and [Chromium uses it in their code as well](https://source.chromium.org/chromium/chromium/src/+/main:base/rand_util_win.cc) (which transitively means that Microsoft uses it in their own web browser, Edge). If there did come a time when Microsoft truly removes this API, it should be easy enough for Rust to simply remove the fallback in the code I've added here
If it is indeed caused by essential snake oils, then I’d argue that Firefox as an application has just large enough of a profile to get the vendors to fix their software. You all aren’t introducing workarounds to enable Firefox’s operation on systems with random memory bit-flips caused by unstable memory, are you? I would argue a similar line of thinking could apply here to at least some extent. |
FWIW I agree, and we do generally try to get vendors to fix their software: we've worked with many companies like Symantec, AVG, McAfee, etc to fix their products, and I completely agree that this is preferable because it improves the software ecosystem for everyone. In this case, we were unable to link the BCrypt issue to a specific source; it's still in our queue to investigate it, and once we do I could foresee a future where we push another change to Rust Std to undo the thing I did. In the meantime, someone in the chain has to unfortunately workaround this issue. Given that It would be nice, of course, if all vendors just followed the APIs they were given properly, but I doubt you'll find any popular language's standard library that doesn't contain a pile of workarounds for this kind of stuff |
Hi, I'm an engineer working for Mozilla on Firefox. We recently noticed we'd been getting crash reports indicating Firefox crashes during
env_logger
initialization. Upon debugging, we found out it was caused by a failure ofBCryptGenRandom
, that was introduced by #84096.Almost all reports are from Windows 7 32-bit. We don't know the reason why
BCryptGenRandom
failed. Possibly the cryptography settings in the registry was corrupted and the process couldn't find the module.To mitigate this crash, can you add a fallback logic, using
RtlGenRandom
ifBCryptGenRandom
fails for whatever reason. Probably we can limit it to Win7.Here's the link to the tracking bug in Bugzilla.
https://bugzilla.mozilla.org/show_bug.cgi?id=1754490
The text was updated successfully, but these errors were encountered: