-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🛑[stdlib] Flesh out _Hasher API #15939
Conversation
@swift-ci please test compiler performance |
3849c29
to
4cfaa59
Compare
@swift-ci please benchmark |
1 similar comment
@swift-ci please benchmark |
Build comment file:Optimized (O)Regression (21)
Improvement (52)
No Changes (349)
Unoptimized (Onone)Regression (52)
Improvement (20)
No Changes (350)
Hardware Overview
|
Looks pretty good! Except for those UTF-16 hashing benchmarks, of course; oops. |
Hm; ASCII String hashing is about 5.5x faster now, while Unicode hashing costs exactly the same. I'm not sure what's going on with |
4cfaa59
to
0836f83
Compare
0836f83
to
e3c9c27
Compare
- String hashing is not inlinable, so it can use _Hasher._core operations directly. - Remove custom buffering. - Make sure we feed the normalized UTF-8 encoding to the hasher; don’t insert extra bytes at the first ASCII->non-ASCII transition. - Add an explicit terminator byte (0xFF).
The new _unsafeHashValue(seed:) requirement allows stdlib types to specialize their hashing when they’re hashed on their own (i.e., not as a component of some composite type). This makes it possible to get rid of discriminator/terminator values and to eliminate most of _Hasher’s resiliency overhead, leading to considerable speedups.
…issue When Set/Dictionary is nested in another Set, the boundaries of the nested collections weren’t correctly delineated in commutative hashing. For example, these Sets all hashed the same: [[1, 2], [3, 4]] [[1, 3], [2, 4]] [[1, 4], [2, 3]] Hash collisions could thus be systematically generated. To fix this, remove collection-level support for one-shot hashing and revert to the previous method of generating hash values. (Set is still able to support one-shot hashing for its members, though.)
This is safe to do with hash(into:), because random hash collisions can be eliminated with awesome certainty by trying a number of different hash seeds. (Unless there is a weakness in SipHash.) In some cases, we intentionally want hashing to produce looser equivalency classes than equality — to let those cases keep working, add an optional hashEqualityOracle parameter. Review usages of checkHashable and add hash oracles as needed.
This makes it easier to understand failure traces in test logs.
Add FIXMEs for potential correctness issues with the existing hashValue definition.
This prototype is not fully implemented, and it relies on specific hash values to not trigger unhandled cases. To keep its test working, define and use a custom hashing interface that emulates hashValue behavior prior to SE-0206.
e3c9c27
to
7af6776
Compare
@swift-ci benchmark |
I fixed the benchmark regression; ASCII hashing is still 5.5x faster, while Unicode hashing is largely unchanged (with a minimal improvement). I expect the benchmarks will be much better now. This PR is getting too large; I think I'll split it into smaller chunks that can be reviewed separately. |
@swift-ci please test |
Build failed |
Build comment file:Optimized (O)Regression (15)
Improvement (49)
No Changes (358)
Unoptimized (Onone)Regression (13)
Improvement (64)
No Changes (345)
Hardware Overview
|
With #17396, all pieces of this PR have landed. Closing. |
Update
_Hasher
API names to match those proposed in SE-0206, and implement missing functionality.