-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix misaligned reference and logic error in crc32 #1906
Conversation
Previously, this code tried to turn a &[u8] into a &[u32] without checking alignment. This means it could and did create misaligned references, which is UB. This can be detected by running the tests with -Zbuild-std --target=x86_64-unknown-linux-gnu (or whatever your host is). This change adopts the approach from the murmurhash implementation. The previous implementation also ignored the tail bytes. The loop at the end treats num_bytes as if it is the full length of the slice, but it isn't, num_bytes number of bytes after the last 4-byte group. This can be observed for example by changing "hello" to just "hell" in the tests. Under the old implementation, the test will still pass. Now, the value that comes out changes, and "hello" and "hell" hash to different values.
Codecov Report
@@ Coverage Diff @@
## master #1906 +/- ##
==========================================
- Coverage 83.42% 83.42% -0.01%
==========================================
Files 214 214
Lines 57025 57018 -7
==========================================
- Hits 47574 47567 -7
Misses 9451 9451
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, I am a bit surprised at how incorrect this method was. Is it actually used anywhere, mainly wondering if we have a gap in our test coverage somewhere...
while offset < num_bytes { | ||
hash = _mm_crc32_u8(hash, bytes[offset]); | ||
offset += 1; | ||
let remainder = bytes.len() % 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, this code tried to turn a &[u8] into a &[u32] without checking alignment. This means it could and did create misaligned references, which is UB. This can be detected by running the tests with -Zbuild-std --target=x86_64-unknown-linux-gnu (or whatever your host is). This change adopts the approach from the murmurhash implementation. The previous implementation also ignored the tail bytes. The loop at the end treats num_bytes as if it is the full length of the slice, but it isn't, num_bytes number of bytes after the last 4-byte group. This can be observed for example by changing "hello" to just "hell" in the tests. Under the old implementation, the test will still pass. Now, the value that comes out changes, and "hello" and "hell" hash to different values.
Previously, this code tried to turn a &[u8] into a &[u32] without checking alignment. This means it could and did create misaligned references, which is UB. This can be detected by running the tests with -Zbuild-std --target=x86_64-unknown-linux-gnu (or whatever your host is). This change adopts the approach from the murmurhash implementation. The previous implementation also ignored the tail bytes. The loop at the end treats num_bytes as if it is the full length of the slice, but it isn't, num_bytes number of bytes after the last 4-byte group. This can be observed for example by changing "hello" to just "hell" in the tests. Under the old implementation, the test will still pass. Now, the value that comes out changes, and "hello" and "hell" hash to different values.
Previously, this code tried to turn a &[u8] into a &[u32] without checking alignment. This means it could and did create misaligned references, which is UB. This can be detected by running the tests with -Zbuild-std --target=x86_64-unknown-linux-gnu (or whatever your host is). This change adopts the approach from the murmurhash implementation. The previous implementation also ignored the tail bytes. The loop at the end treats num_bytes as if it is the full length of the slice, but it isn't, num_bytes number of bytes after the last 4-byte group. This can be observed for example by changing "hello" to just "hell" in the tests. Under the old implementation, the test will still pass. Now, the value that comes out changes, and "hello" and "hell" hash to different values. Can drop this after rebase on commit ded6316 "Fix misaligned reference and logic error in crc32 (apache#1906)", first released in 17.0.0
Previously, this code tried to turn a &[u8] into a &[u32] without checking alignment. This means it could and did create misaligned references, which is UB. This can be detected by running the tests with -Zbuild-std --target=x86_64-unknown-linux-gnu (or whatever your host is). This change adopts the approach from the murmurhash implementation. The previous implementation also ignored the tail bytes. The loop at the end treats num_bytes as if it is the full length of the slice, but it isn't, num_bytes number of bytes after the last 4-byte group. This can be observed for example by changing "hello" to just "hell" in the tests. Under the old implementation, the test will still pass. Now, the value that comes out changes, and "hello" and "hell" hash to different values. Can drop this after rebase on commit ded6316 "Fix misaligned reference and logic error in crc32 (apache#1906)", first released in 17.0.0
Previously, this code tried to turn a &[u8] into a &[u32] without checking alignment. This means it could and did create misaligned references, which is UB. This can be detected by running the tests with -Zbuild-std --target=x86_64-unknown-linux-gnu (or whatever your host is). This change adopts the approach from the murmurhash implementation. The previous implementation also ignored the tail bytes. The loop at the end treats num_bytes as if it is the full length of the slice, but it isn't, num_bytes number of bytes after the last 4-byte group. This can be observed for example by changing "hello" to just "hell" in the tests. Under the old implementation, the test will still pass. Now, the value that comes out changes, and "hello" and "hell" hash to different values. Can drop this after rebase on commit ded6316 "Fix misaligned reference and logic error in crc32 (apache#1906)", first released in 17.0.0
Which issue does this PR close?
Closes #.
Rationale for this change
Previously, this code tried to turn a &[u8] into a &[u32] without
checking alignment. This means it could and did create misaligned
references, which is UB. This can be detected by running the tests with
-Zbuild-std --target=x86_64-unknown-linux-gnu (or whatever your host
is). This change adopts the approach from the murmurhash implementation.
The previous implementation also ignored the tail bytes. The loop at the
end treats num_bytes as if it is the full length of the slice, but it
isn't, num_bytes number of bytes after the last 4-byte group. This can
be observed for example by changing "hello" to just "hell" in the tests.
Under the old implementation, the test will still pass. Now, the value
that comes out changes, and "hello" and "hell" hash to different values.
What changes are included in this PR?
A soundness and correctness fix for crc32_hash
Are there any user-facing changes?
I don't know. Can users observe the values that come out of crc32_hash?