fix Hash implementation #66

pascalkuthe · 2022-10-27T11:42:10Z

The current Hash implementation of Ropey violates the
std API contract and therefore breaks with most non-default Hashers.

Ropeys current Hash implementation simply hashes all chunks in a loop.
However, this means that the chunk boundries are encoded into the Hash.
From the documentation of Hasher:

This trait provides no guarantees about how the various write_* methods are
defined and implementations of [Hash] should not assume that they work one
way or another. You cannot assume, for example, that a [write_u32] call is
equivalent to four calls of [write_u8]. Nor can you assume that adjacent
write calls are merged, so it's possible, for example, that
# fn foo(hasher: &mut impl std::hash::Hasher) {
hasher.write(&[1, 2]);
hasher.write(&[3, 4, 5, 6]);
# }
and
# fn foo(hasher: &mut impl std::hash::Hasher) {
hasher.write(&[1, 2, 3, 4]);
hasher.write(&[5, 6]);
# }
end up producing different hashes.

Thus to produce the same hash value, [Hash] implementations must ensure
for equivalent items that exactly the same sequence of calls is made -- the
same methods with the same parameters in the same order.

This means that no matter what the chunk boundaries are, the same write calls must happen.
The tests did not catch this because the default Hasher does not exploit this property.
However, most other Hashers with less strong guarantees about dos resistance (ahash, fxhash, fnvhash)
do exploit this property.
The default Hasher may also start to exploit this property at any point in the future (and it would not be considered a breaking change by std).
When using such a Hasher with ropey the property hash(rope1) != hash(rope2) <=> rope1 != rope2 does
not hold anymore. This leads to HashMaps with the same key present multiple times (or failed lookups when the key does exist).

The simplest fix for this is simply to call write_u8 for each byte of the chunk.
However, this is extremely slow, because Hashers usually heavily optimize batch operations.
For example for ahash this is at least 16 times slower, because it always operates on 16 bytes at once.

Instead, I have created a hash implementation that uses a temporary buffer to divide
the bytes of the ropes into slices with length BYTES_MIN (except the last slice which may be smaller) and passes those to the Hasher.
This causes a bit of overhead as some bytes need to be copied at the chunk boundaries.
However, the overhead should be much smaller than calling write_u8 repeatedly and should only degrade performance slightly

src/slice.rs

tests/hash.rs

cessen

Thanks so much for catching this! I could have sworn I had read the Hasher documentation carefully, but apparently not.

I would definitely like to get this merged, but I'd prefer a different implementation that doesn't make any assumptions about other parts of Ropey. I've requested changes in-line.

cessen · 2022-11-01T15:22:10Z

src/rope.rs

+        // we can not simply call state.write(chunk) for each chunk
+        // because Hasher.write only produces the same input if the
+        // calls match exactly. See documentation of hash and hast_02 test below
+        // so we have to perform some tricks here to ensure that the same
+        // calls to write are produce no matter the chunk layout


Style nit: comments in Ropey should be complete sentences with proper capitalization and punctuation.

(To be fair, I'm pretty sure I've written comments in Ropey that don't follow that, because Ropey has been around for a while and my views on comment style have changed during that time. But new comments should follow that.)

I have updated the comments, hope its ok now

cessen · 2022-11-01T15:39:34Z

fuzz/Cargo.toml

+[[bin]]
+name = "hash"
+path = "fuzz_targets/hash.rs"
+test = false
+doc = false


These should be property tests, not fuzz tests. In general, I'm trying to keep Ropey's fuzz testing aimed more at things like memory safety and unintended panics. (I also have some general tree-validity checks in there, but that's just because they were convenient to throw in.)

Randomized testing for for logical correctness goes in the property tests. See tests/proptest_tests.rs.

I see that this tests makes more sense as a proptests. II implemented as a fuzz tests because property tests often don't run long enough/can't use debug information to exhaust all possible code paths (and there is really no practical difference between them except a nicer reduce logic apart from that). I didn't change this yet because you said you will reconsider below. Just let me know if I should turn this into a propytest and I will do so

cessen · 2022-11-01T16:00:49Z

src/rope.rs

+            // this bounds check always succedes because all chunks are larger then MIN_BYTES
+            // the only exception is the root leaf in a small file but that would be `RSEnum::Light`
+            // invariant for all cunks: MIN_BYTES <= chunk.len() <= MAX_BYTES < 3* MIN_BYTES


I would strongly prefer that the Hash implementation not make any assumptions about chunk size. For things handling the node tree, of course. But it feels odd here, and my gut tells me it could be a foot gun in the future.

The comment is actually worng here. The boundscheck always succedes here because we perform this exact boundscheck in the if above. I can turn the manually unrolled loop below into an actual loop to avoid the dependency on MAX_BYTES

I have updated the implementation and the comments, it does not rely on any internal invariants of the rope anymore

cessen · 2022-11-01T16:03:25Z

src/rope.rs

+        // calls match exactly. See documentation of hash and hast_02 test below
+        // so we have to perform some tricks here to ensure that the same
+        // calls to write are produce no matter the chunk layout
+        let mut accumulator = [0u8; MIN_BYTES];


I think it would make more sense for the accumulator buffer to just have its own fixed size, rather than being based on MIN_BYTES. I'm not sure what size would be best, but we can play around with it to see what performs best.

I will setup some benchmarks. The problem is that the performance highly depends on the exact hasher implementation. I think we generally want a a multiple of 16 here (which MIN_BYTES is) that is smaller then most chunks so that the chunk.len() < MIN_BYTES - accumulator_pos branch can get good branch prediction. So MIN_BYTES might just be a sweetspot but I guess I will look into benchmarking. Another reason MIN_BYTES is attractive here is that it is fixed to the chunksize so if that ever changes we won't have to redetermine the ideal buffer size

I have added a separate constant for BUFFER_SIZE. From your comment below it sounds like you will change the implementation anyway so I didn't invest the time to benchmark it and just set it to BYTES_MIN for now

pascalkuthe · 2022-11-01T17:01:26Z

Thanks for the review. I won't have time to get into this too much today but I will try to address your comments and performs some benchmarks later this week.

cessen · 2022-11-01T19:23:49Z

Yeah, no rush! I've actually come down with Covid (I tried to be pretty careful at the conference, masking up and everything, but nothing's 100%). So my brain isn't at it's best at the moment anyway, and I'll likely be getting to things a bit slowly for the next week-ish as I recover.

cessen · 2022-11-01T20:17:43Z

Thinking about it more, I actually have a somewhat specific implementation in mind for this, because I've implemented code like this many, many times before (ironically, internally to hashers, to avoid exactly this issue). But it's not reasonable to expect you to hit a narrow target like that via progressive comments in a code review.

So I think what probably makes the most sense is to accept this PR, and then I can do my own implementation afterwards if I have the energy.

~~However, I would still like the fuzz tests to be moved to property tests, since the issue this addresses is a lot closer to a logic error than a safety issue.~~ (I'll revisit this when my head is on straight again.)

cessen · 2022-11-06T11:33:54Z

So, I think what I'm going to do (if it's okay with you) is actually just redo this PR myself, but nabbing some of your unit tests. I was initially hesitant to do that because I didn't want to steal credit away from your work (which is very appreciated!). But I'll just make sure to credit you in the commit message(s). Moreover, I definitely plan to merge your other PRs once they pass review, of which #70 in particular is quite substantial.

Anyway, my rationale behind just re-doing it is:

I'll basically be reimplementing the Hash impl anyway.
I don't think we actually need any fuzzing/property-esk tests for this particular bug. Just normal unit tests are fine, to ensure we don't regress in the future.
Those tests can be improved by adding some private/doc-hidden APIs to Ropey to allow building a Rope with specific chunks, which I think will be useful for testing other things as well. That way we can directly construct the test cases we're looking for, rather than trying to indirectly coax Ropey into hopefully constructing things with the chunk boundaries we want, which isn't reliable into the future anyway.

So I'll just be taking care of that all at once. That okay with you @pascalkuthe?

pascalkuthe · 2022-11-06T15:01:12Z

So, I think what I'm going to do (if it's okay with you) is actually just redo this PR myself, but nabbing some of your unit tests. I was initially hesitant to do that because I didn't want to steal credit away from your work (which is very appreciated!). But I'll just make sure to credit you in the commit message(s). Moreover, I definitely plan to merge your other PRs once they pass review, of which #70 in particular is quite substantial.

Anyway, my rationale behind just re-doing it is:
* I'll basically be reimplementing the Hash impl anyway.

* I don't think we actually need any fuzzing/property-esk tests for this _particular_ bug.  Just normal unit tests are fine, to ensure we don't regress in the future.

* Those tests can be improved by adding some private/doc-hidden APIs to Ropey to allow building a Rope with specific chunks, which I think will be useful for testing other things as well.  That way we can directly construct the test cases we're looking for, rather than trying to indirectly coax Ropey into hopefully constructing things with the chunk boundaries we want, which isn't reliable into the future anyway.
So I'll just be taking care of that all at once. That okay with you @pascalkuthe?

Sure go ahead. It would be nice if you put some kind of credit into a commit but I mostly just care about that the bug gets fixed in a performant manner. Please also test with either ahash or fxhash as the performance differences are much more visible there then with siphash

archseer · 2022-11-06T15:12:16Z

But I'll just make sure to credit you in the commit message(s).

If you put a Co-authored-by: Name Here <email@org> in the commit message, github will correctly attribute that to both users :)

cessen · 2022-11-07T02:02:52Z

Done in fef5be9.

I also added benchmarks with the default hasher, fnv, and fxhash at various text sizes. Unsurprisingly (to me, at least), fnv is the slowest. People think of fnv as a "fast" hash, but it works per-byte which actually makes it pretty slow unless you're using extremely small input sizes.

To test overhead, I also implemented a bogus version that just passes the first chunk to the hasher and then stops. For the benches with very small inputs, this ends up being (incidentally) correct, but with as low overhead as possible. Compared to that, the impl I've ended up with has about 15% performance overhead on tiny inputs, for all the tested hashes, which doesn't seem too bad to me. That overhead should be notably less on larger inputs due to amortization, though I haven't tested yet to confirm.

In any case, it seems at least reasonably efficient.

pascalkuthe · 2022-11-07T03:54:49Z

Done in fef5be9.

I also added benchmarks with the default hasher, fnv, and fxhash at various text sizes. Unsurprisingly (to me, at least), fnv is the slowest. People think of fnv as a "fast" hash, but it works per-byte which actually makes it pretty slow unless you're using extremely small input sizes.

To test overhead, I also implemented a bogus version that just passes the first chunk to the hasher and then stops. For the benches with very small inputs, this ends up being (incidentally) correct, but with as low overhead as possible. Compared to that, the impl I've ended up with has about 15% performance overhead on tiny inputs, for all the tested hashes, which doesn't seem too bad to me. That overhead should be notably less on larger inputs due to amortization, though I haven't tested yet to confirm.

In any case, it seems at least reasonably efficient.

Awesome thanks for implementing a fix so quickly! Yeah fnv is not a great hasher. Fxhash is generally faster then siphahs tough ( ahahs is more robust against hash collisions at the const of being ever so slightly slower then FXhash when not compiling with AES CPU feature so I generally use that). The performance numbers sound good, how did you end up with 256 buffer size? I would have expected larger buffers to be fast or did that not make a difference when put to the test?

cessen · 2022-11-07T08:58:06Z

how did you end up with 256 buffer size?

Just trial-and-error, testing power-of-two sizes. I actually want to revisit that when I can bench on my main machine. I'm currently isolating in my bedroom, so I just have my tiny, slow laptop to use. And that means there's also a lot of CPU frequency scaling going on, which makes benchmarking a bit tricky (which is also why I've only tested powers of two so far--more fine-grained changes were too noisy on this machine). So once I can benchmark on my desktop machine, I might tweak things more based on those results.

I would have expected larger buffers to be fast or did that not make a difference when put to the test?

Of the sizes I tested, 256 was the fastest. 512 was about on par, but slightly slower. 1024 and larger sharply fell off in performance, and 128 and lower fell off somewhat less sharply.

The approach I implemented can skip data copying (just passing slices directly from the chunks) under certain conditions, one of which is that the chunk has to be larger-than-or-equal-to the block size. So making the block size smaller actually makes the zero-copy code path more likely. My suspicion is that it's that vs amortizing the loop overhead that led to this result. 256 is large enough to amortize the loop-and-test overhead fairly well, but small enough to frequently take the zero-copy code path. That's my guess, at least. More benchmarking/testing is still needed to narrow in on the best performance, I think.

That does mean that there's likely some relationship between the block size and average chunk size with respect to performance. But it's not clear if it's a linear relationship, so leaving it independent still seems best to me.

This was referenced Oct 27, 2022

correcly fixup tree when a removal occurs at the chunk boundery #67

Merged

better fuzzing #68

Merged

(Git) diff gutter implementation helix-editor/helix#3890

Merged

Significantly improve performance of :reload helix-editor/helix#4457

Merged

archseer reviewed Oct 27, 2022

View reviewed changes

src/slice.rs Outdated Show resolved Hide resolved

archseer reviewed Oct 27, 2022

View reviewed changes

src/slice.rs Outdated Show resolved Hide resolved

archseer reviewed Oct 27, 2022

View reviewed changes

tests/hash.rs Outdated Show resolved Hide resolved

pascalkuthe force-pushed the fix_hash branch 3 times, most recently from f8000f2 to 74aade1 Compare October 27, 2022 15:59

cessen requested changes Nov 1, 2022

View reviewed changes

pascalkuthe force-pushed the fix_hash branch 2 times, most recently from 87eb745 to 1dbb7c6 Compare November 4, 2022 18:42

fix Hash implementation

e10a069

pascalkuthe force-pushed the fix_hash branch from 1dbb7c6 to e10a069 Compare November 4, 2022 18:45

pascalkuthe requested a review from cessen November 4, 2022 19:56

cessen closed this Nov 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix Hash implementation #66

fix Hash implementation #66

pascalkuthe commented Oct 27, 2022 •

edited

Loading

cessen left a comment

cessen Nov 1, 2022

pascalkuthe Nov 4, 2022

cessen Nov 1, 2022

pascalkuthe Nov 4, 2022 •

edited

Loading

cessen Nov 1, 2022

pascalkuthe Nov 1, 2022

pascalkuthe Nov 4, 2022

cessen Nov 1, 2022

pascalkuthe Nov 1, 2022

pascalkuthe Nov 4, 2022 •

edited

Loading

pascalkuthe commented Nov 1, 2022 •

edited

Loading

cessen commented Nov 1, 2022

cessen commented Nov 1, 2022 •

edited

Loading

cessen commented Nov 6, 2022

pascalkuthe commented Nov 6, 2022

archseer commented Nov 6, 2022

cessen commented Nov 7, 2022

pascalkuthe commented Nov 7, 2022

cessen commented Nov 7, 2022 •

edited

Loading

fix Hash implementation #66

fix Hash implementation #66

Conversation

pascalkuthe commented Oct 27, 2022 • edited Loading

cessen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pascalkuthe Nov 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pascalkuthe Nov 4, 2022 • edited Loading

Choose a reason for hiding this comment

pascalkuthe commented Nov 1, 2022 • edited Loading

cessen commented Nov 1, 2022

cessen commented Nov 1, 2022 • edited Loading

cessen commented Nov 6, 2022

pascalkuthe commented Nov 6, 2022

archseer commented Nov 6, 2022

cessen commented Nov 7, 2022

pascalkuthe commented Nov 7, 2022

cessen commented Nov 7, 2022 • edited Loading

pascalkuthe commented Oct 27, 2022 •

edited

Loading

pascalkuthe Nov 4, 2022 •

edited

Loading

pascalkuthe Nov 4, 2022 •

edited

Loading

pascalkuthe commented Nov 1, 2022 •

edited

Loading

cessen commented Nov 1, 2022 •

edited

Loading

cessen commented Nov 7, 2022 •

edited

Loading