Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable SSE2 compression path to work on MSVC #2653

Merged
merged 11 commits into from
May 17, 2021
Merged

Conversation

TrianglesPCT
Copy link
Contributor

@TrianglesPCT TrianglesPCT commented May 14, 2021

The compile time detection uses the __SSE2__ predefined macro, but this does not exist on MSVC--resulting it that compiler always using scalar path.

On MSVC you can detect SSE support by checking for _M_AMD64, as when targeting x64 it requires SSE2 as a baseline, this patch adds this check.

@facebook-github-bot
Copy link

Hi @TrianglesPCT!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

Copy link
Contributor Author

@TrianglesPCT TrianglesPCT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

switch to unaligned load as I don't know if buffer will always be aligned to 32 bytes, and compilers aside from MSVC might actually use aligned loads
@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

Copy link
Contributor Author

@TrianglesPCT TrianglesPCT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch to unaligned load

@Cyan4973
Copy link
Contributor

Thanks @TrianglesPCT ,

did you attempt to measure performance differences provided by your PR ?

lib/compress/zstd_lazy.c Outdated Show resolved Hide resolved
lib/compress/zstd_lazy.c Outdated Show resolved Hide resolved
lib/compress/zstd_lazy.c Outdated Show resolved Hide resolved
Switch to other comment style
Remove the AVX2 part
@TrianglesPCT TrianglesPCT changed the title Enable SSE2 compression path to work on MSVC, and add AVX2 match find Enable SSE2 compression path to work on MSVC May 15, 2021
It put the changes back when I tried to make a separate pull request, i don't understand githubs interface at all.
@TrianglesPCT
Copy link
Contributor Author

Thanks @TrianglesPCT ,

did you attempt to measure performance differences provided by your PR ?

I removed the AVX2 stuff for a later PR

@Cyan4973
Copy link
Contributor

Well, even without AVX2,
this PR is presumed to help performance, because it's supposed to help SSE2 detection ?

@TrianglesPCT
Copy link
Contributor Author

All it is doing is normalizing MSVC to target the same code path as GCC/Clang when SSE2 is set as a requirement for your compile target. Not adding any new code..

The defined(__SSE2__) check doesn't work on MSVC because that define doesn't exist even if SSE2 is enabled.

If you want specific performance comparison I can do that might be few days until I have time for that.

@terrelln
Copy link
Contributor

If you want specific performance comparison I can do that might be few days until I have time for that.

That would be great! But that doesn't need to block merging this PR. You can easily measure performance with the builtin benchmark tool.

zstd -b5e12 silesia.tar

@ghost
Copy link

ghost commented May 15, 2021

Will there be a hot fix for v1.5.0?

@wolfpld
Copy link
Contributor

wolfpld commented May 15, 2021

You can easily measure performance with the builtin benchmark tool.

The difference is rather spectacular. Results for 5950X.

Before:

 5#silesia.tar       : 211957760 ->  63806593 (3.322), 111.0 MB/s , 977.5 MB/s
 6#silesia.tar       : 211957760 ->  62980544 (3.365), 107.9 MB/s ,1001.6 MB/s
 7#silesia.tar       : 211957760 ->  61482185 (3.447),  73.1 MB/s ,1055.9 MB/s
 8#silesia.tar       : 211957760 ->  60914308 (3.480),  57.2 MB/s ,1076.1 MB/s
 9#silesia.tar       : 211957760 ->  59928587 (3.537),  52.1 MB/s ,1100.2 MB/s
10#silesia.tar       : 211957760 ->  59296205 (3.575),  50.3 MB/s ,1103.6 MB/s
11#silesia.tar       : 211957760 ->  59152928 (3.583),  47.7 MB/s ,1106.8 MB/s
12#silesia.tar       : 211957760 ->  58640204 (3.615),  30.6 MB/s ,1123.8 MB/s

After:

 5#silesia.tar       : 211957760 ->  63806593 (3.322), 160.8 MB/s , 980.0 MB/s
 6#silesia.tar       : 211957760 ->  62980544 (3.365), 153.3 MB/s ,1004.7 MB/s
 7#silesia.tar       : 211957760 ->  61482185 (3.447), 111.9 MB/s ,1056.3 MB/s
 8#silesia.tar       : 211957760 ->  60914308 (3.480),  90.9 MB/s ,1079.5 MB/s
 9#silesia.tar       : 211957760 ->  59928587 (3.537),  74.7 MB/s ,1101.9 MB/s
10#silesia.tar       : 211957760 ->  59296205 (3.575),  71.3 MB/s ,1107.0 MB/s
11#silesia.tar       : 211957760 ->  59152928 (3.583),  65.0 MB/s ,1113.2 MB/s
12#silesia.tar       : 211957760 ->  58640204 (3.615),  51.3 MB/s ,1129.6 MB/s

Copy link
Contributor

@senhuang42 senhuang42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great fix, thanks for the contribution!

@Cyan4973 Cyan4973 merged commit 02ece5d into facebook:dev May 17, 2021
wolfpld added a commit to wolfpld/tracy that referenced this pull request May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants