Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compile-time-rng on a platform with getrandom #94

Closed
TheBlueMatt opened this issue Jun 11, 2021 · 9 comments
Closed

compile-time-rng on a platform with getrandom #94

TheBlueMatt opened this issue Jun 11, 2021 · 9 comments

Comments

@TheBlueMatt
Copy link

As a feature request, it would be quite nice to be able to use compile-time-rng (or, better, some dummy-unsafe-no-rng cfg) when building binaries with hashbrown maps while fuzzing.

The https://github.com/rust-fuzz crates seem to mostly always set --cfg=fuzzing, so its possible to also check #[cfg(fuzzing)] without ever actually exposing a potentially-unsafe feature. Happy to work to implement if this would be acceptable.

@tkaitchuck
Copy link
Owner

tkaitchuck commented Jun 13, 2021

the build.rs can easily detect a fuzzing flag. (As can the code.)
What do you want the behavior to be when fuzzing? (I don't understand the motivation enough to infer this)
Should the code be fully deterministic so that every hashmap uses the same keys and hashes the same way?
(IE: Just turning off the random seeding, or should the pointer math, and counters be disabled also?)

Additionally, should it force the fallback algorithm?

@TheBlueMatt
Copy link
Author

A straight constant hash value would likely be ideal.

The goal here ultimately is full determinism in the sense that if you run the same code in the same order the keys should be identical (even if you run the same code twice in a row in the same process/static context). Pointer math likely wouldn't work as, while fuzzers should be turning off ASLR/etc, you may run multiple test cases in the same memory space and get different pointers. Just using compile time RNG values would work fairly well, but would result in fuzzers finding duplicate test cases across runs where you may re-build and get different hash values. Ultimately a straight constant is probably the easiest to implement and the best for fuzzing.

@tkaitchuck
Copy link
Owner

Ok. So even counters should be disabled. Because while it would be identical if the process is re-run, if the same code is called twice in a row in a loop it will produce different results.

I can add a flag to do this, but it still won't make the process fully deterministic if there is a non-ahash hashmap used anywhere. Additionally, there is the possibility it introduces quadratic runtime into functions that otherwise take linear time.

What sort of variable/flag is best to avoid this being 'on' when not intended?

@TheBlueMatt
Copy link
Author

I can add a flag to do this, but it still won't make the process fully deterministic if there is a non-ahash hashmap used anywhere.

Sure, for my use-case I can just swap everything out for hashbrown so in my case it's easy :)

Additionally, there is the possibility it introduces quadratic runtime into functions that otherwise take linear time.

Right, maybe a reason to not always enable it when fuzzing? I assume in most fuzzing cases there aren't enough hash map entries for it to be a concern, however.

What sort of variable/flag is best to avoid this being 'on' when not intended?

The canonical one in Rust is #[cfg(fuzzing)], which is enabled by default by the rust-fuzz projects when building. Given it's not a feature and only an explicit --cfg it's somewhat hard to accidentally turn on (ie has to be a compile flag, cannot be set via Cargo.toml). I'd generally leave it at that, but if there's concern about the behavior being surprising we could swap for some other cfg like ahash-not-random.

@tkaitchuck
Copy link
Owner

tkaitchuck commented Jun 20, 2021

Yes, #[cfg(fuzzing)] is definitely better than a feature flag, because it can't creep in accidently. IE: Some random dependency specifies it, therefor it gets set on the whole binary accidently.

The only thing I don't like about fuzzing is the connection is not clear. Nothing is being specified on this package so it might be surprising that its behavior changed because fuzing is enabled. The second thing that is a minor negative, is the fuzzer won't actually be able to affect the iteration order of maps. (And some bugs could depend on this).

If you are in-control of the main() I could add a setter method for a supplier for the random sequence used by the RandomStates. Effectively refactor the current implementation to be defined via a trait, and allow alternative impls to be supplied via a static setter. (impls would need to be send + sync) Then you could build a tool where these random values are supplied by the fuzzer (possibly indirectly).

@TheBlueMatt
Copy link
Author

Oops, I'm sorry this somehow fell off my stack.

The only thing I don't like about fuzzing is the connection is not clear. Nothing is being specified on this package so it might be surprising that its behavior changed because fuzing is enabled.

Yea, its true that its a bit strange, but aside from your next point it seems like what a fuzz user would almost certainly want, so its a little less annoying. Specifically, the fact that fuzz output will randomly change and fail any time you use a hashmap in rust is actually quite annoying, and requires a rather significant amount of manual effort to resolve.

The second thing that is a minor negative, is the fuzzer won't actually be able to affect the iteration order of maps. (And some bugs could depend on this).

Indeed, I'm not entirely sure what to do about that. I suppose hashbrown could also detect fuzzing and perturb the iteration order itself, but that seems excessive.

If you are in-control of the main() I could add a setter method for a supplier for the random sequence used by the RandomStates. Effectively refactor the current implementation to be defined via a trait, and allow alternative impls to be supplied via a static setter. (impls would need to be send + sync) Then you could build a tool where these random values are supplied by the fuzzer (possibly indirectly).

At least for my use-case that's more than fine, because I'm aware of the issue and work with fuzzers in Rust regularly. For those who are less accustomed to fuzzing, non-deterministic behavior can be somewhat surprising, and some fuzz tools light up large warnings about non-determinism which isn't clear the source of. I suppose could always do both - default to deterministic behavior with cfg(fuzzing) and also provide a static setter which can use fuzz input as random source, or at least allow allow for compile-time randomness.

@tkaitchuck
Copy link
Owner

I have sortof a prototype of this. (Though annoyingly it does not work on ARM)

The basic idea is that we could have a feature/flag to check which disables runtime randomess, and a main can call this function to supply randomness (presumably from the fuzzer).

There is still a bunch of work to do. In terms of interface I don't really like the fact that the fixed and dynamic randomness are on the same trait, but I am not really sure what would be best for an application.

@TheBlueMatt
Copy link
Author

I have sortof a prototype of this. (Though annoyingly it does not work on ARM)

Cool! Thanks.

The basic idea is that we could have a feature/flag to check which disables runtime randomess, and a main can call this function to supply randomness (presumably from the fuzzer).

Right, makes sense. I'm not sure what the right answer is, but, on balance, an inconsistent fuzzer is probably better than a fuzzer that will miss a bug that depends on hashmap iteration order, so probably best to use something like cfg(ahash_no_random) or whatever. Of course useful to document in README as some things (I believe driller is one) will give you completely inscrutable errors as a result of randomness being loaded at runtime.

There is still a bunch of work to do. In terms of interface I don't really like the fact that the fixed and dynamic randomness are on the same trait, but I am not really sure what would be best for an application.

Hmm, right, I assume they're hashed together in one way or another? I guess there's not a ton of reason to a fuzzer should care about both, as long as it can get hashes to output anything from just the dynamic part maybe fixed can stay fixed and dynamic entropy can come from the interface that fuzz main's can override? This would also make it a tiny bit harder to foot-gun with the API.

xu-cheng added a commit to xu-cheng/aHash that referenced this issue Aug 28, 2021
This is especially useful for no_std development, where we don't want to
introduce the getrandom dependency.

After this change, one can choose to disable random-state feature and
implement their own version of RandomState using whatever random source
they choose.

This also somewhat addresses tkaitchuck#94 and avoids the need of using
compile-time-rng in restricted platforms.
xu-cheng added a commit to xu-cheng/aHash that referenced this issue Aug 28, 2021
This is especially useful for no_std development, where we don't want to
introduce the getrandom dependency.

After this change, one can choose to disable random-state feature and
implement their own version of RandomState using whatever random source
they choose.

This also somewhat addresses tkaitchuck#94 and avoids the need of using
compile-time-rng in restricted platforms.
xu-cheng added a commit to xu-cheng/aHash that referenced this issue Aug 28, 2021
This is especially useful for no_std development, where we don't want to
introduce the getrandom dependency.

After this change, one can choose to disable random-state feature and
implement their own version of RandomState using whatever random source
they choose.

This also somewhat addresses tkaitchuck#94 and avoids the need of using
compile-time-rng in restricted platforms.
xu-cheng added a commit to xu-cheng/aHash that referenced this issue Aug 28, 2021
This is especially useful for no_std development, where we don't want to
introduce the getrandom dependency.

After this change, one can choose to disable random-state feature and
implement their own version of RandomState using whatever random source
they choose.

This also somewhat addresses tkaitchuck#94 and avoids the need of using
compile-time-rng in restricted platforms.
@tkaitchuck
Copy link
Owner

This is now available in v0.8.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants