Refactor avx512f #1597

TDecking · 2024-06-25T10:10:11Z

Fused multiply-add functions have been reworked and can now be used by miri.
Square root functions have been reworked and can now be used by miri.
The definitions of some functions with explicit rounding have been simplified.
Some functions now correctly use _MM_FROUND_CUR_DIRECTION.
Some integer functions have been reworked and can now be used by miri.
Some missing intrinsics were added.
Masked integer comparisons now properly use the mask registers.
Some documentation issues were fixed.

rustbot · 2024-06-25T10:10:16Z

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

TDecking · 2024-06-25T12:52:37Z

@sayantn Do you mind if I open this? There is a bit of an overlap between this PR and yours, but some things in this PR is not present in your work.

sayantn · 2024-06-25T15:19:41Z

So the only overlap is fma intrinsics and masked loads? I have no problem with you implementing the fma (honestly I didn't know about the simd_fma intrinsic. But I think right now you should not do the masked load/stores. simd_masked_load aligns with only the element's alignment, so it will never generate the aligned load instructions. See rust-lang/rust#126919. Also, typically in stdarch it is preferred to link with llvm and use the simd intrinsics using the core::simd types instead of the __m128i etc. I will remove the fma enhancements from my PR, it will remain draft for some time.

TDecking · 2024-06-25T16:14:25Z

@sayantn I've removed the masked load changes on my end. Our PRs should now be orthogonal to each other.

sayantn · 2024-06-25T16:17:04Z

Yes I will modify my PR in a while. I will also implement the missing reduce-max etc intrinsics and fix the _mm_cvtt intrinsics (they currently generate vcvt instructions, not cvtt)

sayantn · 2024-06-26T13:34:39Z

can you also please do the floating-point abs using simd_fabs

TDecking · 2024-06-26T14:09:30Z

@sayantn done.

sayantn · 2024-06-26T14:30:03Z

Thanks.

sayantn · 2024-06-30T03:09:24Z

I have already done the remaining gather-scatter in avx512f. Can you complete avx512bw - the reduce intrinsics and some mask operations? Then I will start on the remaining IFMA and BF16, then start implementing the new VEX variants

rustbot assigned Amanieu Jun 25, 2024

TDecking force-pushed the avx512f branch 9 times, most recently from 82fb013 to 51da0ac Compare June 25, 2024 12:45

sayantn mentioned this pull request Jun 25, 2024

Various Fixes and enhancements in x86 intrinsics #1594

Merged

TDecking force-pushed the avx512f branch 2 times, most recently from 1e0a0e0 to c8fc6f2 Compare June 25, 2024 16:05

TDecking force-pushed the avx512f branch from fb45b5e to 3ea862f Compare June 25, 2024 21:24

TDecking force-pushed the avx512f branch from 10f65e4 to 141d591 Compare June 26, 2024 14:21

TDecking force-pushed the avx512f branch from 141d591 to 927eb77 Compare June 26, 2024 14:49

TDecking added 2 commits June 29, 2024 20:58

Refactor avx512f: fma

a75b516

Refactor avx512f: rounding fma

b4a3f6f

TDecking force-pushed the avx512f branch 2 times, most recently from dd20b4f to 9aae346 Compare June 29, 2024 22:05

TDecking added 2 commits June 30, 2024 00:16

Refactor avx512f: sqrt + rounding fix

2609c06

Refactor avx512f: integers

11a7765

TDecking added 3 commits June 30, 2024 00:16

Refactor avx512f: integer comparison

5940305

Refactor avx512f: zeroing primitives

3d3e87a

Refactor avx512f: floating point abs

caf4361

TDecking force-pushed the avx512f branch 3 times, most recently from 0007890 to 507cef8 Compare June 29, 2024 23:41

Refactor avx512f: element extraction

b5461f8

TDecking force-pushed the avx512f branch from 507cef8 to 19c0bad Compare June 29, 2024 23:48

Refactor avx512f: mask operations

8627a92

TDecking force-pushed the avx512f branch from 19c0bad to 8627a92 Compare June 30, 2024 00:01

sayantn mentioned this pull request Jun 30, 2024

Implement missing AVX512 intrinsics #1600

Merged

8 tasks

Amanieu merged commit 5ccd76c into rust-lang:master Jun 30, 2024
30 checks passed

TDecking deleted the avx512f branch July 1, 2024 01:14

CatsAreFluffy mentioned this pull request Nov 15, 2024

_mm*_mask_cmp_*_mask::<7> comparisons don't respect the input mask rust-lang/rust#133067

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor avx512f #1597

Refactor avx512f #1597

Uh oh!

TDecking commented Jun 25, 2024 •

edited

Loading

Uh oh!

rustbot commented Jun 25, 2024

Uh oh!

TDecking commented Jun 25, 2024

Uh oh!

sayantn commented Jun 25, 2024

Uh oh!

TDecking commented Jun 25, 2024

Uh oh!

sayantn commented Jun 25, 2024 •

edited

Loading

Uh oh!

sayantn commented Jun 26, 2024

Uh oh!

TDecking commented Jun 26, 2024

Uh oh!

sayantn commented Jun 26, 2024

Uh oh!

sayantn commented Jun 30, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Refactor avx512f #1597

Refactor avx512f #1597

Uh oh!

Conversation

TDecking commented Jun 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rustbot commented Jun 25, 2024

Uh oh!

TDecking commented Jun 25, 2024

Uh oh!

sayantn commented Jun 25, 2024

Uh oh!

TDecking commented Jun 25, 2024

Uh oh!

sayantn commented Jun 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayantn commented Jun 26, 2024

Uh oh!

TDecking commented Jun 26, 2024

Uh oh!

sayantn commented Jun 26, 2024

Uh oh!

sayantn commented Jun 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TDecking commented Jun 25, 2024 •

edited

Loading

sayantn commented Jun 25, 2024 •

edited

Loading

sayantn commented Jun 30, 2024 •

edited

Loading