-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Make the consumer crate no-std #471
Conversation
I'm no longer sure that using B-trees for the temporary collections is the right choice. Aside from the fact that Rust's hash tables use a relatively complex hasher with random inputs by default, I feel that the B-tree collections are overly complicated for what we're doing with them. In particular, B-trees require a larger number of allocations, at least for small collections, than a hash table without collisions. So, I now think it would be better to directly use hashbrown for no-std, and possibly use the relatively simple rustc-hash hasher to minimize binary size and avoid randomness. Adding rustc-hash would be an extra dependency, but a small one, and compromising actual code efficiency for a lower dependency count doesn't make me happy. As for hashbrown, unconditionally depending on it could lead to some code duplication in typical builds that use std, though I don't know how much, because of generic monomorphization. So maybe, in the consumer crate, we want to conditionally enable no-std and hashbrown using a feature. |
ff679ae
to
2d4aabc
Compare
I decided to go back to using hash tables for the reasons I discussed in the previous comment. I decided to just use hashbrown unconditionally, for simplicity, and to use hashbrown's default hasher, which is now foldhash. foldhash's README presents reasons why foldhash provides a better balance of quality and performance than rustc-hash (also called fxhash). As for DoS resistance, when used in no-std mode, foldhash provides some basic protection, using the stack pointer at the time of hasher construction as a source of non-determinism. I think that's a good balanced approach, providing some basic protection while not bringing in a dependency on a cryptographic RNG. With accesskit_consumer now using hashbrown and foldhash while accesskit_windows still uses the std HashMap with its default hasher, a fully size-optimized build of the Windows hello_world example is about 1 KB larger than it was before this PR, using the std hash map across the board. If I modify accesskit_windows to also use hashbrown with foldhash, then the size of that build goes down by about 2 KB from where it started. So I can modify the platform adapters to use hashbrown with foldhash in a separate PR, with the fix tag so as not to cause a semver-breaking change in those crates. |
@DataTriny Can you please investigate why cargo deny failed in the latest CI run? I don't think it could have been caused by anything I changed. |
@mwcampbell Failure is due to foldhash's license: it uses Zlib which we currently don't allow. I think this license is fine though so you can add it in the list of allowed licenses in |
Very happy to see this change! Congrats, y'all! |
I think the only potentially controversial part of this is the decision to switch from hash tables to B-trees for the temporary data structures that are built while processing an update. Here's my reasoning.
HashMap
andHashSet
aren't inalloc
because the default hasher requires entropy. We could add astd
feature toaccesskit_consumer
and then bring inhashbrown
as a dependency only whenstd
is disabled. But for simplicity, I'd rather not add an extra feature and different code paths if we can avoid it.I don't like depending on a random number generator anyway, even when using
std
. Predictability is a valuable attribute in software, and I think it's worth maximizing it at as many levels of the stack as possible. Also, at least on Windows, using the random number generator adds an extra library dependency that non-Rust users have to use when statically linking AccessKit.As for performance, according to the Rust collections documentation, the performance of B-tree operations is O(log(n)), where as the expected performance for hash tables is O(1). Note, though, that the latter is expected performance, meaning that it's not guaranteed because hashing is probabilistic. In my opinion, log(n) never gets too large, and having that predictable upper bound is better.
I'm aware that the page I linked above also says to use hash tables unless one has a specific reason to do otherwise. I think I now disagree with that advice. I think it overvalues best-case performance compared to predictability.
Switching to B-trees does increase total binary size. I mitigated this somewhat by going to the trouble to avoid removing entries from the temporary
pending_nodes
andpending_children
maps. This should be good for speed as well, since as I mentioned, the maps are temporary; they go away as soon as update processing is finished. With this mitigation, the added binary size, for a size-optimized build withpanic = "abort"
on x86-64, is just under 10 KB. When I switch the platform adapters to also use B-trees for their own structures, that should drop a few KB.