Skip to content

Refactor avx512f #1597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 30, 2024
Merged

Refactor avx512f #1597

merged 9 commits into from
Jun 30, 2024

Conversation

TDecking
Copy link
Contributor

@TDecking TDecking commented Jun 25, 2024

  • Fused multiply-add functions have been reworked and can now be used by miri.
  • Square root functions have been reworked and can now be used by miri.
  • The definitions of some functions with explicit rounding have been simplified.
  • Some functions now correctly use _MM_FROUND_CUR_DIRECTION.
  • Some integer functions have been reworked and can now be used by miri.
  • Some missing intrinsics were added.
  • Masked integer comparisons now properly use the mask registers.
  • Some documentation issues were fixed.

@rustbot
Copy link
Collaborator

rustbot commented Jun 25, 2024

r? @Amanieu

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@TDecking TDecking force-pushed the avx512f branch 9 times, most recently from 82fb013 to 51da0ac Compare June 25, 2024 12:45
@TDecking
Copy link
Contributor Author

@sayantn Do you mind if I open this? There is a bit of an overlap between this PR and yours, but some things in this PR is not present in your work.

@sayantn
Copy link
Contributor

sayantn commented Jun 25, 2024

So the only overlap is fma intrinsics and masked loads? I have no problem with you implementing the fma (honestly I didn't know about the simd_fma intrinsic. But I think right now you should not do the masked load/stores. simd_masked_load aligns with only the element's alignment, so it will never generate the aligned load instructions. See rust-lang/rust#126919. Also, typically in stdarch it is preferred to link with llvm and use the simd intrinsics using the core::simd types instead of the __m128i etc. I will remove the fma enhancements from my PR, it will remain draft for some time.

@TDecking
Copy link
Contributor Author

@sayantn I've removed the masked load changes on my end. Our PRs should now be orthogonal to each other.

@sayantn
Copy link
Contributor

sayantn commented Jun 25, 2024

Yes I will modify my PR in a while. I will also implement the missing reduce-max etc intrinsics and fix the _mm_cvtt intrinsics (they currently generate vcvt instructions, not cvtt)

@sayantn
Copy link
Contributor

sayantn commented Jun 26, 2024

can you also please do the floating-point abs using simd_fabs

@TDecking
Copy link
Contributor Author

@sayantn done.

@sayantn
Copy link
Contributor

sayantn commented Jun 26, 2024

Thanks.

@TDecking TDecking force-pushed the avx512f branch 2 times, most recently from dd20b4f to 9aae346 Compare June 29, 2024 22:05
@TDecking TDecking force-pushed the avx512f branch 3 times, most recently from 0007890 to 507cef8 Compare June 29, 2024 23:41
@sayantn
Copy link
Contributor

sayantn commented Jun 30, 2024

I have already done the remaining gather-scatter in avx512f. Can you complete avx512bw - the reduce intrinsics and some mask operations? Then I will start on the remaining IFMA and BF16, then start implementing the new VEX variants

@sayantn sayantn mentioned this pull request Jun 30, 2024
8 tasks
@Amanieu Amanieu merged commit 5ccd76c into rust-lang:master Jun 30, 2024
30 checks passed
@TDecking TDecking deleted the avx512f branch July 1, 2024 01:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants