-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zstd: extend executeSimple with history support #542
zstd: extend executeSimple with history support #542
Conversation
"Regressions" seems within margin of error. Good job, I will go through it. |
Impact of commit 3a35124 (comparison with the previous commit from this branch). Can't tell it's significantly better or worse, except few cases.
|
So a tricky situation is that we cannot overread from history, since we may get out of the current page. So we need a "precise" memory copier. I have one for s2 which can be adapted. It is used for sizes 1->64, but it could be 0 -> 16 so it can be called whenever there is less than 16 bytes left. Untested, but this should be something like it: // func genMemMoveShort
// src and dst may not overlap.
// No registers are updated.
// Length must be 0 -> 16 bytes
func genMemMoveShort(name string, dst, src, length reg.GPVirtual, end LabelRef) {
Comment("genMemMoveShort")
AX, CX := GP64(), GP64()
name += "_memmove_"
// Only enable if length can be 0.
if true {
TESTQ(length, length)
JEQ(end)
}
CMPQ(length, U8(3))
JB(LabelRef(name + "move_1or2"))
JE(LabelRef(name + "move_3"))
CMPQ(length, U8(8))
JB(LabelRef(name + "move_4through7"))
//Label(name + "move_8through16")
MOVQ(Mem{Base: src}, AX)
MOVQ(Mem{Base: src, Disp: -8, Index: length, Scale: 1}, CX)
MOVQ(AX, Mem{Base: dst})
MOVQ(CX, Mem{Base: dst, Disp: -8, Index: length, Scale: 1})
JMP(end)
Label(name + "move_1or2")
MOVB(Mem{Base: src}, AX.As8())
MOVB(Mem{Base: src, Disp: -1, Index: length, Scale: 1}, CX.As8())
MOVB(AX.As8(), Mem{Base: dst})
MOVB(CX.As8(), Mem{Base: dst, Disp: -1, Index: length, Scale: 1})
JMP(end)
Label(name + "move_3")
MOVW(Mem{Base: src}, AX.As16())
MOVB(Mem{Base: src, Disp: 2}, CX.As8())
MOVW(AX.As16(), Mem{Base: dst})
MOVB(CX.As8(), Mem{Base: dst, Disp: 2})
JMP(end)
Label(name + "move_4through7")
MOVL(Mem{Base: src}, AX.As32())
MOVL(Mem{Base: src, Disp: -4, Index: length, Scale: 1}, CX.As32())
MOVL(AX.As32(), Mem{Base: dst})
MOVL(CX.As32(), Mem{Base: dst, Disp: -4, Index: length, Scale: 1})
JMP(end)
} |
Right, I will change it. |
3a35124
to
83ad8f9
Compare
OK, I used another memory copy routine for history. Updated the main issue with current timings. Still nice improvements for cases with history. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's take out this. Then I don't see any problems in merging.
Co-authored-by: Klaus Post <klauspost@gmail.com>
Add history support in the asm implementation. Part of task #515.
As usual marking as a draft, because of few failing tests.[fixed]Performance comparison between the current master and this branch on an IceLake machine with the hacked
decodeSync
is below. There are some nice speedups, but there are also regressions for almost all cases without history. To overcome that, we can have two specialisations:executeWithoutDictionary
andexecuteWithoutDictionatyAndHistory
- what do you think?