-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce RowHash's tag space size by x2 #3543
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good!
I did a quick benchmark and see just about neutral performance, +- a few percent based on the compiler. Also a small ratio loss, but it seems reasonable. I'll let you do the full benchmarks, just wanted to verify that the speed looked good on my machine too.
lib/compress/zstd_lazy.c
Outdated
@@ -804,9 +803,10 @@ U16 ZSTD_rotateRight_U16(U16 const value, U32 count) { | |||
* value to reflect the update. Essentially cycles backwards from [0, {entries per row}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should update this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I did, thank you for noticing!
lib/compress/zstd_lazy.c
Outdated
@@ -888,7 +888,7 @@ FORCE_INLINE_TEMPLATE void ZSTD_row_update_internalImpl(ZSTD_matchState_t* ms, | |||
U32 const rowMask, U32 const useCache) | |||
{ | |||
U32* const hashTable = ms->hashTable; | |||
U16* const tagTable = ms->tagTable; | |||
U8* const tagTable = ms->tagTable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use BYTE
here and elsewhere
lib/compress/zstd_lazy.c
Outdated
@@ -903,7 +903,7 @@ FORCE_INLINE_TEMPLATE void ZSTD_row_update_internalImpl(ZSTD_matchState_t* ms, | |||
U32 const pos = ZSTD_row_nextIndex(tagRow, rowMask); | |||
|
|||
assert(hash == ZSTD_hashPtr(base + updateStartIdx, hashLog + ZSTD_ROW_HASH_TAG_BITS, mls)); | |||
((BYTE*)tagRow)[pos + ZSTD_ROW_HASH_TAG_OFFSET] = hash & ZSTD_ROW_HASH_TAG_MASK; | |||
((BYTE*)tagRow)[pos] = hash & ZSTD_ROW_HASH_TAG_MASK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can get rid of the BYTE*
cast now.
@terrelln - did you benchmark on a mac or on an x86 server? |
cc96945
to
faf5f15
Compare
Fixed CR comments and added a spreadsheet for benchmarks. I've tried reducing this regression, but don't have a good solution at the moment and unless it's critical I'll not continue. |
Allocate half the memory for tag space, which means that we get one less slot for an actual tag (needs to be used for next position index). In turn, we slash the memory usage for slightly worse compression ratio or better ratio if we use the same memory size with a higher hashLog.
faf5f15
to
69be424
Compare
x86 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
…ys invalid facebook#3543 decreases the size of the tagTable by a factor of 2, which requires using the first tag position in each row for head position instead of a tag. Although position 0 stopped being a valid match, it still persisted in mask calculation resulting in the matches loops possibly terminating before it should have. The fix skips position 0 to solve this problem.
…ys invalid facebook#3543 decreases the size of the tagTable by a factor of 2, which requires using the first tag position in each row for head position instead of a tag. Although position 0 stopped being a valid match, it still persisted in mask calculation resulting in the matches loops possibly terminating before it should have. The fix skips position 0 to solve this problem.
#3543 decreases the size of the tagTable by a factor of 2, which requires using the first tag position in each row for head position instead of a tag. Although position 0 stopped being a valid match, it still persisted in mask calculation resulting in the matches loops possibly terminating before it should have. The fix skips position 0 to solve this problem.
Hi, can this be reflected in the comment at https://github.com/facebook/zstd/blame/25822342be59d831bad65426ae51f5cc22157b09/lib/compress/zstd_lazy.c#L1116-L1118 Thanks! |
Allocate half the memory for tag space, which means that we get one less slot for an actual tag (needs to be used for next position index).
The results is a slight loss in compression ratio (up to 0.2%) and some regressions/improvements to speed depending on level and sample. In turn, we get to save 16% of the hash table's space (5 bytes per entry instead of 6 bytes per entry).
Update:
CR comments fixed.
Benchmark in spreadsheet.
Update 2:
Another interesting benchmark is
fullbench -b41 -l9 enwik8.1k
which repeatedly usesZSTD_compressStream
to recompress 1024 bytes out of enwik8. Since #3528 hasn't been deployed yet, we can observe that this PR makes the benchmark twice as fast for small data sizes as the majority of time is spent in the tag space intiailization.