Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for clang-cl on Windows #633

Merged
merged 5 commits into from
Aug 2, 2024

Conversation

anthony-linaro
Copy link
Contributor

Thir PR adds support for clang-cl (clang, but pretending to be MSVC) to sse2neon on Windows ARM64 platforms. Done as part of some blender work, as using clang-cl gives a ~20-40% speedup compared to MSVC.

This is a WIP, as several tests are still failing in Release - Debug seems to be fine - if anyone has any ideas, please let me know, otherwise I'll keep looking at them through a debugger.

Compiled with the command line (via a VS2022 Native ARM64 Tools CMD window):

msbuild sse2neon.vcxproj /p:Configuration=Release /p:CLToolExe=clang-cl.exe /p:CLToolPath=C:\Program Files\LLVM\bin\

Failing+skipped tests:

Test mm_set_flush_zero_mode         skipped
Test mm_set_rounding_mode           failed
Test mm_setcsr                      failed
Test mm_storeu_si16                 failed
Test mm_storeu_si64                 failed
Test mm_cvtpd_epi32                 failed
Test mm_cvtpd_pi32                  failed
Test mm_cvttpd_epi32                failed
Test mm_cvttpd_pi32                 failed
Test mm_cvttsd_si64x                skipped
Test mm_storeu_si32                 failed
Test mm_alignr_epi8                 skipped
Test mm_alignr_pi8                  skipped
Test mm_set_denormals_zero_mode     failed
Test rdtsc                          failed
SSE2NEONTest Complete!
Passed:  508
Failed:  11
Ignored: 4
Coverage rate: 97.14%

I can provide access to a machine via VPN (wireguard) for the purposes of this PR, if a maintainer without a relevant machine wishes to have a go.

@anthony-linaro
Copy link
Contributor Author

Taking a deeper look, it seems to me that anything that touches FPCR or uses the msr and mrs asm instruction seems to be failing (plus a few other, non-asm related tests).

@anthony-linaro
Copy link
Contributor Author

Now only mm_cvttpd_epi32 and mm_cvttpd_pi32 fail, which according to #635, will be fixed in #638

I guess this is ready for review then

@anthony-linaro anthony-linaro changed the title WIP: Add support for clang-cl on Windows Add support for clang-cl on Windows Jul 2, 2024
@jserv
Copy link
Member

jserv commented Jul 18, 2024

Now only mm_cvttpd_epi32 and mm_cvttpd_pi32 fail, which according to #635, will be fixed in #638

Pull request #638 has been merged. Could you rebase and confirm?

@anthony-linaro
Copy link
Contributor Author

anthony-linaro commented Jul 18, 2024

@jserv I am now seeing 2 failures:

Test mm_cvttpd_epi32                failed
Test rdtsc                          failed

@anthony-linaro
Copy link
Contributor Author

@jserv Are you able to review? Are the test fialures still down to issues with the test harness itself, rather than sse2neon?

I'd like to be able to move blender to using clang-cl for Windows ARM64 platforms, as it has a nearly 40% perf improvement in places, but I would need this to be merged to go ahead.

@jserv jserv merged commit 29716df into DLTcollab:master Aug 2, 2024
16 checks passed
@jserv
Copy link
Member

jserv commented Aug 2, 2024

Thank @anthony-linaro for contributing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants