Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please reconsider the strategy in RandomState for getting a hashing key #48

Closed
cbeck88 opened this issue Sep 14, 2020 · 16 comments
Closed

Comments

@cbeck88
Copy link
Contributor

cbeck88 commented Sep 14, 2020

In order for aHash to achieve DOS resistance, the key used for the hash must be a secret not known to the attacker.
However, a problem (not only for rust but for all low-level programming that needs randomness) is that there are not usually completely portable APIs for getting randomness. In rust stdlib the APIs for randomness are in std and not core.

This is a big pain for hashmaps and AFAIK it is the only reason that the stdlib hashmap is not in alloc crate like all the other collections. That is very annoying for people trying to port their code that uses hashmaps to an environment with no operating system. Portability is super important for rust and not enough libraries in the ecosystem pay attention to this. So I appreciate your focus on this and the efforts you went to try to make something that will work.

Here's the strategy that I see based on review of the current code. (https://github.com/tkaitchuck/aHash/blob/master/src/random_state.rs)

(1) The const-random feature gets randomness at compile time and bakes it into the binary as a constant.
(2) The RandomState default implementation also mixes in the addresses of some stack and global variables, with a code comment explaining that ASLR will randomize these addresses, so this is like a source of randomness.

However, there are a few big problems with this:
(1) Turning on const-random feature basically means that I am baking my secret keys into the binary. It is NOT normally okay to assume that the attacker does not have the binary. People make releases on github of their binaries all the time. Even if their project is not open source, all kinds of engineers and contractors are likely to have access to a build that runs on the servers. If anyone who has the binary can extract the key and then DOS the server, that is terrible and way outside the threat model for most projects. This basically runs up against Kerckhoff's law: https://en.wikipedia.org/wiki/Kerckhoffs's_principle
It is not enough for the key to be chosen randomly "at some point in time", and rolling the random dice in the build.rs doesn't really fix anything. The point of choosing the key randomly at all is to make it a secret from the attacker, who is assumed not to have access to the specific machine where the process is running.

(2) Turning on the const-random feature throws repeatable builds out the window. Most of the time when people move security-critical code around, they will do things like take hashes of the binaries to confirm a correct download or a correct build. No one expects that a process is going to intentionally bake random bytes into the binary. In some cases, like SGX, not having a repeatable build destroys the guarantees of SGX -- the point is that someone else could build your software from scratch and get the same hash as the remote SGX hardware is telling them. A year ago, I had to spend several days tracking down why our SGX enclave build is not repeatable, by diffing the intermediate build artifacts repeatedly until I could isolate the problem, and the problem was aHash const-random feature. I now basically have to screen every third party lib we add to the project to check if there is an aHash somewhere in the tree without const-random feature disabled. This is a major footgun that will have to be disabled in serious projects, so I would argue that it should just be removed.

(3) If const-random is off, then the only source of entropy in our keys is ASLR. Here's the thing: ASLR is an OS feature. If you don't have an OS, you probably don't have ASLR either. You generally don't have ASLR in embedded devices, and you don't have ASLR in SGX enclaves. In fact, in any case where ASLR would be present, I expect that you can simply ask the OS for randomness instead of using pointer tricks based on ASLR assumption. The advantage of using a standard API for getting randomness from the OS, instead of trying to extract randomness indirectly via ASLR, is that ASLR is not actually an interface for getting randomness, and may not actually give you a secret random value like you would get from a normal interface. ASLR is a defense-in-depth technique to try to defend against ROP. But the address offsets can leak depending on the structure of the rest of your program. If you simply ask the OS for randomness instead of trying to rely on ASLR, then the OS will give you a value that can't be leaked or inferred in this way.

If there is no ASLR present on the system, then the secret key derivation in aHash likely fails in a manner similar to the debian Random Number Generator Bug from some years ago. Assuming that const-random is off, or that the adversary has a copy of the binary, there isn't really any other entropy present, so, game over, no DOS resistance.

So unfortunately, although we did all this work to try to make the library more portable and help the people working on no_std environments, embedded devices and such, we likely just created bigger problems for them, because we didn't use a standard API for obtaining randomness for secret keys.


I want to suggest an alternate approach: In the last year or so, the getrandom crate has matured and offers no_std support. This now appears to me to be the best and most portable way to obtain randomness. It is also the basis of OsRng now in the rand crate.

What I would suggest is:
(1) RandomState should get entropy from the getrandom crate, which becomes an optional dependency of AHash.
(2) When getrandom crate is not available, don't offer a default-initialized RandomState -- force the user to use the with_keys API and provide secret keys on their own, or tell them to patch the getrandom crate so that it will work for their target. (And ideally submit that patch upstream)
(3) All the const-random and ASLR-based key derivation should just go away.

This way, in an environment where ASLR is not actually present, instead of silently building insecure stuff, we fail at build time.

By using standard APIs and investing our maintenance energies in them, we can avoid one-off tricks and strengthen the ecosystem as a whole.

@Amanieu
Copy link
Contributor

Amanieu commented Sep 16, 2020

hashbrown uses AHash as its default hash function not because it is DoS resistant but because it is a pretty fast hash function that is on par with FxHash but avoids FxHash's poor bit mixing. hashbrown makes no guarantees on DoS resistance with its default hasher so we would be OK with just using with_keys(0, 0) or some other method of obtaining default keys without relying on OS randomness.

@cbeck88
Copy link
Contributor Author

cbeck88 commented Sep 16, 2020

Sure -- hashbrown should configure aHash in whatever way will meet the goals of hashbrown default config.

But ahash main page says this

AHash is the fastest DOS resistant hash for use in HashMaps available in the Rust language. Failing in any of these criteria will be treated as a bug.

And other projects are using it, like this one:

https://github.com/xacrimon/dashmap

This one uses ahash not via hashbrown, and without default-features=false, so they all get the const-random build breakage stuff.

This one uses it, presumably for portability:

https://docs.rs/lasso/0.3.1/src/lasso/lib.rs.html#359

mod hasher {
    compile! {
        if #[feature = "ahasher"] {
            pub use ahash::RandomState;
        } else {
            pub use std::collections::hash_map::RandomState;
        }
    }
}

They are probably unaware that the ahash with default config does all this terrible const random stuff, otherwise I assume they would turn it off.

They are probably aware that ahash builds even when there is no standard library, but they probably are not aware that if there is actually no OS (and hence no ASLR) then it is probably broken and provides none of the DOS resistance. The documentation says nothing about this. So if DOS resistance is the goal of ahash, then this is arguably a bug in ahash.

@tkaitchuck
Copy link
Owner

I agree doing getrandom() every time is much more secure. Unfortunately it is also MUCH more expensive.
In the standard library this is avoided by getting a random value once per thread and then simply incrementing it by one each time.

I don't consider incrementing by one to be very secure, because it means that any (even partial) information learned about the key can be used over and over and perhaps accumulated over time, by looking at the way different maps hash things. I would consider the current approach of scrambling using the memory address strictly superior to this. It is strictly less predictable than a fixed constant, and has the same cost.

In that light the real question is should getrandom() + lazystatic or threadlocal replace const-random. This is reasonable. It's trading some runtime cost for not having to be concerned about an attacker having access to the binary.

In terms of implementation, I think the most efficient route might be a static atomic, check if it is 0 and if so add getrandom() otherwise increment by locally available information.

The real problem with randomness from my point of view is atomics. I consider this more serious than the above concerns about const-random. Right now it's working with usize because this is one of the few atomics that are expected to be available cross platform. However a usize just isn't very large. We could assume it's small and have 4 of them, but then we have to do 4 atomic reads per hashmap creation which would depending on architecture, be a significant overhead.

Do either of you have thoughts on ways around this?

The obvious solution if to use thread-locals. However this isn't a great option in embedded and in my benchmarks on X86, it is quite slow.

We don't actually need deterministic behaviour, but I can't just use unsafe to read/write concurrently because that would be UB. If there was a way to do the equivalent of a read/write to the same memory location that lead to non-deterministic behavior but not undefined behavior (Similar to what exists in the Java memory model) I would take it, but I am not aware of a way to do this. Another approach might be to do conditional compilation based on architecture to select the largest size atomic available, but I don't know a good way to do this. At the very least we would need a table of which architecture supports which sizes.

@cbeck88
Copy link
Contributor Author

cbeck88 commented Sep 18, 2020

Ok - I think I understand what you are saying about getrandom being slow. And some applications might be creating a lot of hashmaps, so you don't want to actually read from /dev/random or whatever every time that happens. and if you want to be portable and use stable rust, then there's no thread-local sadly.

Here's what I would suggest if we want to make a nicer interface than getrandom for this:

  • Start with a lazy static key of like 16 or 32 bytes, that gets initialized once from getrandom, so it's actually pure entropy
  • Also, have an atomic counter that counts how many times someone has pulled from the interface, they would use like fetch_add to get the value
  • Put these two values into a cryptographic PRF. It's going to have a fixed size output. Use this to implement the RngCore interface.

Ideally, we end up with something very similar to std::ThreadRng, but without any thread-local state and with no dependency on std. We could put this in a new crate and call it ThreadRngCore or something.

I'm not sure what exactly the PRF should be, but it's possible it could just be like Aes128 block cipher. You would use the pure entropy value as the key to Aes, and the counter value would be the nonce. Then you encrypt an all-zeroes buffer to get the output of the PRF.

DJB says stuff like this about AES for example in page 8 of this paper about ed25519 signatures: http://ed25519.cr.yp.to/ed25519-20110926.pdf

It would of course also be safe to generate r with a cipher such as AES, combined with standard PRF-stretching mechanisms to support a long input; but we prefer to reuse H to save area in hardware implementations.

Ultimately, what makes that okay is that good ciphers are supposed to be semantically secure under chosen plaintext attacks, and that property turns out to be not much different from PRFs.

In a machine with AESNI instructions that's probably super fast. In another machine you could probably use some other block cipher.

I want to make sure I understand what perf expectations are here:

We could assume it's small and have 4 of them, but then we have to do 4 atomic reads per hashmap creation which would depending on architecture, be a significant overhead.

If they are all in the same cacheline in memory then what I would expect is that 4 atomic reads is not much worse than 1 atomic read, maybe a few cycles extra. When you create a hashmap, you are also going to have to call malloc, right? So that should be way more costly than whatever we will do here, like 100 cycles should be very conservative.

IDK, it would be researchy, and I would want some RustCrypto person to take a look at it to build confidence. But I think the basic idea is sound at least.

But like fundamentally, that's what I would say should happen here - it should not really be in-scope for AHash crate to contain what is essentially a home-brew entropy pool, using the const-random and ASLR pointers as entropy source. You should really only want to have the hasher implementation here, and put the thing that gets seeds (which takes no seed itself) in a different crate, because that is a pretty complex problem.

I'm going to try to write up a draft of this sometime soon, I think it will be interesting :)

@tkaitchuck
Copy link
Owner

Having lazy_static use getrandom() initialize 32 bytes of entropy once per process, is sufficiently cheap and only involves one atomic operation per hashmap creation. That works well. I would like to avoid a second and third, which should be possible.

I don't think we need to maintain a global counter. That would add two more atomic calls per map. The only point is to make different maps use slightly different keys. This, I think is where address can be used. It's not a good source of randomness, but neither is incrementing by a fixed constant. If we combine this with the high-quality random seed in a way that cannot be reversed, we should get good keys.

The final step of generating keys from the seeds + unique value, I think can be done using the hashing algorithm itself. (After all it is intended to be non-reversible) We can just do some refactoring to allow the code to be invoked explicitly with keys and a value. (This has the advantage of using the best hardware instructions without extra effort). With this approach, the overhead of creating a new map is no worse that the cost of hashing 4 fixed size keys plus one atomic load. (Which is probably acceptably low)

The only complication, is that we need a way to have some compile time switch to allow platforms which getrandom does not support to compile if they explicitly pass keys.

@Amanieu
Copy link
Contributor

Amanieu commented Sep 19, 2020

Note that lazy_static requires std so you might as well just use thread_local! under a std feature flag which can be disabled for people (i.e. hashbrown) who need no_std support and will provide keys manually.

@cbeck88
Copy link
Contributor Author

cbeck88 commented Sep 23, 2020

Having lazy_static use getrandom() initialize 32 bytes of entropy once per process, is sufficiently cheap and only involves one atomic operation per hashmap creation. That works well. I would like to avoid a second and third, which should be possible.

+1 fwiw

I don't think we need to maintain a global counter. That would add two more atomic calls per map. The only point is to make different maps use slightly different keys. This, I think is where address can be used. It's not a good source of randomness, but neither is incrementing by a fixed constant. If we combine this with the high-quality random seed in a way that cannot be reversed, we should get good keys.

Yeah I agree -- as long as there is some actual high quality entropy entering the system, then if the hash function is actually resistant, it should even be fine if every hashmap in the program uses the same keys. Making them use different keys is a defense-in-depth that would make a larger program with many interfaces harder to attack. So there I agree that using the pointer to this or whatever as the personalization for the hasher function is reasonable and not much different from a counter, and it adds value.
That is a pretty creative trick :)

The final step of generating keys from the seeds + unique value, I think can be done using the hashing algorithm itself.

+1, if you assume the hash algo is good, even only in the very weak sense that it does not destroy entropy, then this makes sense

The only complication, is that we need a way to have some compile time switch to allow platforms which getrandom does not support to compile if they explicitly pass keys.

Yeah I'd suggest just making the Default implementation gated on #![cfg(feature = "getrandom")] or something like this, but it's up to you

Note that lazy_static requires std so you might as well just use thread_local! under a std feature flag which can be disabled for people (i.e. hashbrown) who need no_std support and will provide keys manually.

We've been using the spin_no_std feature on lazy_static to make it no_std compatible, but I don't have a good sense of how acceptable that is in widely used portable libraries. I could imagine that in a short-lived program that is supposed to run and complete its task very fast, spinning for lazy_static wouldn't be okay. But I've never actually heard someone working on such an application say so.

One nice thing about lazy_static is that the end consumer can flip on the spin_no_std and get no_std support via cargo feature unification, without you the library having to create a configuration path for that

But yeah you could make the whole lazy_static an optional dependency implied by the getrandom config, and then as long as the user flips on spin_no_std then they will get a no_std library. Or you could set spin_no_std yourself if you think that's alright. Or you could have a spin_no_std feature yourself I guess.

@cbeck88
Copy link
Contributor Author

cbeck88 commented Sep 23, 2020

If you know a better way than spin locks to do lazy statics in no_std rust right now I would love to know.
In my understanding if you don't have some sort of thread::yield() function in core, then spinning is your only option if you want to be portable

@tkaitchuck
Copy link
Owner

tkaitchuck commented Sep 29, 2020

What if we did something like:

const UNINITIALIZED: (u128, u128) = (0, 0);
static VALUE: AtomicPtr<(u128, u128)> = AtomicPtr::new(&UNINITIALIZED as *const (u128, u128) as *mut (u128, u128));

Then we can reduce the call to a load, and compare the pointer with &UNINITIALIZED and if it is the same, generating a random value and doing store of the generated value.

This avoids any dependencies (aside from what is needed for the random) and is no_std. Of course to actually allocate the (u128, u128) it will need to depend on alloc but, for hashmaps that isn't a problem, because they need to allocate anyway.

@Amanieu
Copy link
Contributor

Amanieu commented Sep 29, 2020

I would recommend keeping things simple and just:

  • Require no_std users to provide keys themselves.
  • Use std when randomness is required so that thread_local is available.

@tkaitchuck
Copy link
Owner

tkaitchuck commented Sep 29, 2020

Part of the motivation for the way things are now, is that users with no_std would have a hard time generating unique values every time without a performance hit, but there is no reason that the values actually need to be totally unique. So I think there are actually three options that make sense:

  1. Have user supplied keys in the form of two u128s.
  2. User supplies nothing. (Available only with std)
  3. User supplies two &'static u128s and we use the pointer/scrambling to make each map different.

With this 3rd approach, we can point to const_random as a way to supply such values, but it eliminates the direct dependency from aHash. So it would not be a default or likely anyone's first choice.

@cbeck88
Copy link
Contributor Author

cbeck88 commented Oct 6, 2020

Part of the motivation for the way things are now, is that users with no_std would have a hard time generating unique values every time without a performance hit, but there is no reason that the values actually need to be totally unique.
With this 3rd approach, we can point to const_random as a way to supply such values, but it eliminates the direct dependency from aHash. So it would not be a default or likely anyone's first choice.

I would argue the following: the const_random stuff doesn't help you get a "unique value", the value it gives you is the same each time you run the code. It only changes when you rebuild. So from point of view of consumer of your software, you are choosing a random value that changes only each time you make a release of your software.

That doesn't meet a lot of people's requirements. But if it meets someone's requirements, then a simpler solution is, go to random.org and paste some random bytes into the source code, and use that where you were going to use const_random. And as part of your release process, you can paste new random bytes if you feel like it. You could make a script that just regenerates the file that declares the constants, so that you can commit the new version. This accomplishes everything that const_random does, and has the advantage that it preserves build determinism, which is important for a lot of reasons. So, that's what I would recommend to anyone who is thinking about using const_random.

@tkaitchuck
Copy link
Owner

@garbageslam Yes, you are right. When I started implementing it, I realized this. So here is where I ended up:
980159d

Basically now there are three alternitives:

  • Running with std in which case you getrandom is used via lazy_static to generate strong random values once, which is combined with address/counter to get the specific keys.
  • Running without std and with const_random which is now off by default. This will use const random values and combine with the address/counter in the same way. This is clearly not as good, but you have to explicitly opt-into it, and if BOTH std and const_random are enabled then const_random does nothing and you still get deterministic builds.
  • If std is off and const_random is not enabled, then new() does not exist, and keys must be manually provided. Also the Default trait is not implemented for RandomState so constructing a hashmap is necessarily more manual.

The reason I left const_random in at all, was for transitive dependencies. If an app which is no-std depends on a library which uses HashBrown in the generic way, not manually providing keys, that library can still be no-std with no special modifications. However if it happens to end up in a std app, then the dependency can be added directly or indirectly and cargo will resolve this by enabling the feature, in which case it ends up with the std code (and deterministic builds) without having to do anything.

@garbageslam @Amanieu Could you please take a look at that CL, before I push it out and make sure you like how it looks?

@Amanieu
Copy link
Contributor

Amanieu commented Oct 7, 2020

LGTM

@tkaitchuck
Copy link
Owner

These changes have been encorperated into the 0.5 release.

@cbeck88
Copy link
Contributor Author

cbeck88 commented Oct 12, 2020

thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants