-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assembly implementation of 4X1 & 4X2 Huffman #2722
Conversation
Great investigation @terrelln ! |
In the latest work:
I updated the perf numbers in the PR, and here is the delta from the first version: GCC Delta:
I still have a bunch of cleanup to do. |
574b552
to
160398d
Compare
11b05fe
to
2304a5d
Compare
I've extracted out the Makefile updates into PR #2783. They're still present in this PR, but will disappear once the other PR is merged. |
a7ae571
to
aef8fe4
Compare
@@ -43,6 +43,8 @@ | |||
#define ZSTD_MULTITHREAD | |||
#endif | |||
#define ZSTD_TRACE 0 | |||
/* TODO: Can't amalgamate ASM function */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, not "critical", but still an important follow-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not sure what to do about that. We may be able to get the amalgamation process to re-write the .S
file as inline assembly.
I'm going to be opening an issue for follow ups, and I'll make sure to mention this.
I could check the performance claims. I believe my only concern about this PR is the additional build complexity. While it's correctly dealt with for the scope of this PR, I believe it introduces some questions :
|
e7acd71
to
0ee7703
Compare
Yeah, assembly should now be correctly handled everywhere in the build system (where assembly is supported e.g. not Visual Studio). The Makefile only currently searches for
I've opened a follow up issue #2789 that tracks this. Basically we should support assembly in Visual Studios eventually, and enable assembly for the amalgamated build. |
Note : I note that, for some reason, AppveyorCI tests seems to have become significantly longer to run after this PR (22-23mn => 31-33mn) |
Thats super odd, I wouldn't expect that. Maybe we can put up a PR that backs out this commit, and see if appveyor gets faster? To rule out coincidence. |
Good idea ! |
@terrelln FYI, this asm code can be used with yasm on VS. First of all, yasm needs some fixes (I made these PRs: yasm#271, yasm#272). Then, yasm has to be built with All this can be done with my fork of VSYASM, it has all the changes required and uses my builds that have cl.exe enabled. In my zstd fork I added VS2022 solution, then, two more PRs required: |
Assembly in
huf_decompress_x86_bmi2.S
.proba32.dat
is 1MB of random symbols in[0, 32)
. Zstd compresses as all literals and uses 4X2 to decompress.proba128.dat
is 1MB of random symbols in[0, 128)
. Zstd compresses as all literals and uses 4X1 to decompress.The assembly is wrapped by a C function that handles the initialization & ending conditions. Once the streams get close to the end, the C takes over and uses
HUF_decompress1X*()
to finish, similarly to the normal 4X1 and 4X2 loops.