-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix Hash implementation #66
Conversation
f8000f2
to
74aade1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for catching this! I could have sworn I had read the Hasher
documentation carefully, but apparently not.
I would definitely like to get this merged, but I'd prefer a different implementation that doesn't make any assumptions about other parts of Ropey. I've requested changes in-line.
src/rope.rs
Outdated
// we can not simply call state.write(chunk) for each chunk | ||
// because Hasher.write only produces the same input if the | ||
// calls match exactly. See documentation of hash and hast_02 test below | ||
// so we have to perform some tricks here to ensure that the same | ||
// calls to write are produce no matter the chunk layout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: comments in Ropey should be complete sentences with proper capitalization and punctuation.
(To be fair, I'm pretty sure I've written comments in Ropey that don't follow that, because Ropey has been around for a while and my views on comment style have changed during that time. But new comments should follow that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the comments, hope its ok now
[[bin]] | ||
name = "hash" | ||
path = "fuzz_targets/hash.rs" | ||
test = false | ||
doc = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be property tests, not fuzz tests. In general, I'm trying to keep Ropey's fuzz testing aimed more at things like memory safety and unintended panics. (I also have some general tree-validity checks in there, but that's just because they were convenient to throw in.)
Randomized testing for for logical correctness goes in the property tests. See tests/proptest_tests.rs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this tests makes more sense as a proptests. II implemented as a fuzz tests because property tests often don't run long enough/can't use debug information to exhaust all possible code paths (and there is really no practical difference between them except a nicer reduce logic apart from that). I didn't change this yet because you said you will reconsider below. Just let me know if I should turn this into a propytest and I will do so
src/rope.rs
Outdated
// this bounds check always succedes because all chunks are larger then MIN_BYTES | ||
// the only exception is the root leaf in a small file but that would be `RSEnum::Light` | ||
// invariant for all cunks: MIN_BYTES <= chunk.len() <= MAX_BYTES < 3* MIN_BYTES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would strongly prefer that the Hash
implementation not make any assumptions about chunk size. For things handling the node tree, of course. But it feels odd here, and my gut tells me it could be a foot gun in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is actually worng here. The boundscheck always succedes here because we perform this exact boundscheck in the if
above. I can turn the manually unrolled loop below into an actual loop to avoid the dependency on MAX_BYTES
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the implementation and the comments, it does not rely on any internal invariants of the rope anymore
src/rope.rs
Outdated
// calls match exactly. See documentation of hash and hast_02 test below | ||
// so we have to perform some tricks here to ensure that the same | ||
// calls to write are produce no matter the chunk layout | ||
let mut accumulator = [0u8; MIN_BYTES]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make more sense for the accumulator buffer to just have its own fixed size, rather than being based on MIN_BYTES
. I'm not sure what size would be best, but we can play around with it to see what performs best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will setup some benchmarks. The problem is that the performance highly depends on the exact hasher implementation. I think we generally want a a multiple of 16
here (which MIN_BYTES is) that is smaller then most chunks so that the chunk.len() < MIN_BYTES - accumulator_pos
branch can get good branch prediction. So MIN_BYTES
might just be a sweetspot but I guess I will look into benchmarking. Another reason MIN_BYTES
is attractive here is that it is fixed to the chunksize so if that ever changes we won't have to redetermine the ideal buffer size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a separate constant for BUFFER_SIZE
. From your comment below it sounds like you will change the implementation anyway so I didn't invest the time to benchmark it and just set it to BYTES_MIN
for now
Thanks for the review. I won't have time to get into this too much today but I will try to address your comments and performs some benchmarks later this week. |
Yeah, no rush! I've actually come down with Covid (I tried to be pretty careful at the conference, masking up and everything, but nothing's 100%). So my brain isn't at it's best at the moment anyway, and I'll likely be getting to things a bit slowly for the next week-ish as I recover. |
Thinking about it more, I actually have a somewhat specific implementation in mind for this, because I've implemented code like this many, many times before (ironically, internally to hashers, to avoid exactly this issue). But it's not reasonable to expect you to hit a narrow target like that via progressive comments in a code review. So I think what probably makes the most sense is to accept this PR, and then I can do my own implementation afterwards if I have the energy.
|
87eb745
to
1dbb7c6
Compare
So, I think what I'm going to do (if it's okay with you) is actually just redo this PR myself, but nabbing some of your unit tests. I was initially hesitant to do that because I didn't want to steal credit away from your work (which is very appreciated!). But I'll just make sure to credit you in the commit message(s). Moreover, I definitely plan to merge your other PRs once they pass review, of which #70 in particular is quite substantial. Anyway, my rationale behind just re-doing it is:
So I'll just be taking care of that all at once. That okay with you @pascalkuthe? |
Sure go ahead. It would be nice if you put some kind of credit into a commit but I mostly just care about that the bug gets fixed in a performant manner. Please also test with either ahash or fxhash as the performance differences are much more visible there then with siphash |
If you put a |
Done in fef5be9. I also added benchmarks with the default hasher, fnv, and fxhash at various text sizes. Unsurprisingly (to me, at least), fnv is the slowest. People think of fnv as a "fast" hash, but it works per-byte which actually makes it pretty slow unless you're using extremely small input sizes. To test overhead, I also implemented a bogus version that just passes the first chunk to the hasher and then stops. For the benches with very small inputs, this ends up being (incidentally) correct, but with as low overhead as possible. Compared to that, the impl I've ended up with has about 15% performance overhead on tiny inputs, for all the tested hashes, which doesn't seem too bad to me. That overhead should be notably less on larger inputs due to amortization, though I haven't tested yet to confirm. In any case, it seems at least reasonably efficient. |
Awesome thanks for implementing a fix so quickly! Yeah fnv is not a great hasher. Fxhash is generally faster then siphahs tough ( ahahs is more robust against hash collisions at the const of being ever so slightly slower then FXhash when not compiling with AES CPU feature so I generally use that). The performance numbers sound good, how did you end up with 256 buffer size? I would have expected larger buffers to be fast or did that not make a difference when put to the test? |
Just trial-and-error, testing power-of-two sizes. I actually want to revisit that when I can bench on my main machine. I'm currently isolating in my bedroom, so I just have my tiny, slow laptop to use. And that means there's also a lot of CPU frequency scaling going on, which makes benchmarking a bit tricky (which is also why I've only tested powers of two so far--more fine-grained changes were too noisy on this machine). So once I can benchmark on my desktop machine, I might tweak things more based on those results.
Of the sizes I tested, 256 was the fastest. 512 was about on par, but slightly slower. 1024 and larger sharply fell off in performance, and 128 and lower fell off somewhat less sharply. The approach I implemented can skip data copying (just passing slices directly from the chunks) under certain conditions, one of which is that the chunk has to be larger-than-or-equal-to the block size. So making the block size smaller actually makes the zero-copy code path more likely. My suspicion is that it's that vs amortizing the loop overhead that led to this result. 256 is large enough to amortize the loop-and-test overhead fairly well, but small enough to frequently take the zero-copy code path. That's my guess, at least. More benchmarking/testing is still needed to narrow in on the best performance, I think. That does mean that there's likely some relationship between the block size and average chunk size with respect to performance. But it's not clear if it's a linear relationship, so leaving it independent still seems best to me. |
The current
Hash
implementation ofRopey
violates thestd API contract and therefore breaks with most non-default
Hasher
s.Ropeys current
Hash
implementation simply hashes all chunks in a loop.However, this means that the chunk boundries are encoded into the
Hash
.From the documentation of
Hasher
:This means that no matter what the chunk boundaries are, the same
write
calls must happen.The tests did not catch this because the default
Hasher
does not exploit this property.However, most other
Hasher
s with less strong guarantees about dos resistance (ahash
,fxhash
,fnvhash
)do exploit this property.
The default
Hasher
may also start to exploit this property at any point in the future (and it would not be considered a breaking change by std).When using such a
Hasher
with ropey the propertyhash(rope1) != hash(rope2) <=> rope1 != rope2
doesnot hold anymore. This leads to
HashMap
s with the same key present multiple times (or failed lookups when the key does exist).The simplest fix for this is simply to call
write_u8
for each byte of the chunk.However, this is extremely slow, because
Hashers
usually heavily optimize batch operations.For example for
ahash
this is at least 16 times slower, because it always operates on 16 bytes at once.Instead, I have created a hash implementation that uses a temporary buffer to divide
the bytes of the ropes into slices with length
BYTES_MIN
(except the last slice which may be smaller) and passes those to theHasher
.This causes a bit of overhead as some bytes need to be copied at the chunk boundaries.
However, the overhead should be much smaller than calling
write_u8
repeatedly and should only degrade performance slightly