Implement x86 AVX intrinsics #3192

eduardosm · 2023-11-26T10:28:01Z

~~Blocked on #3214~~

RalfJung · 2023-12-03T12:14:35Z

Thanks for the PR! Unfortunately I am exceptionally busy right now and honestly I am not sure if I'll be able to review a PR of this size before the Christmas break. Sorry for that.

workingjubilee · 2023-12-07T05:44:06Z

Perhaps if @eduardosm splits out the "mere code movement" parts with rounding and sqrt into a separate PR so that the maskload, etc. stuff can be reviewed separately?

RalfJung · 2023-12-07T05:59:56Z

Splitting up is always helpful, yeah. Reviewing has a strictly superlinear time complexity, I suspect somewhere between O(n * log n) and O(n^2), so two smaller PRs are less work to review than a single PR with twice the size.

workingjubilee · 2023-12-07T07:47:24Z

Yeah. I was also thinking "hm, I could help examine the tricky parts and doublecheck stuff, but Ralf knows how stuff is/should be organized better than me (obviously), so I have no comment on 'where should enum FloatUnaryOp live, and/or does this abstraction need to be changed now that it's seeing more use?"

eduardosm · 2023-12-07T11:54:27Z

Done, created #3214 with the "just moving things" part

Move some x86 intrinsics code to helper functions in `shims::x86` To make them reusable for intrinsics of other x86 features. Splitted from #3192

bors · 2023-12-08T20:21:16Z

☔ The latest upstream changes (presumably #3214) made this pull request unmergeable. Please resolve the merge conflicts.

Move some x86 intrinsics code to helper functions in `shims::x86` To make them reusable for intrinsics of other x86 features. Splitted from rust-lang/miri#3192

RalfJung

Thanks! Mostly looks good, I have some nits and questions though.

I didn't look at the tests (yet); I assume they cover all the new intrinsics and in particular the corner cases.

src/shims/x86/avx.rs

tests/pass/intrinsics-x86-avx.rs

src/shims/x86/avx.rs

RalfJung · 2023-12-22T15:11:06Z

@rustbot author

eduardosm · 2024-01-12T17:14:53Z

@rustbot ready

RalfJung

All right, finally went through all the ops and comments. Sorry for the long wait!

src/shims/x86/avx.rs

RalfJung · 2024-01-25T07:02:21Z

src/shims/x86/mod.rs

+///
+/// Each 128-bit chunk is treated independently (i.e., the value for
+/// the is i-th 128-bit chunk of `dest` is calculated with the i-th
+/// 128-bit blocks of `left` and `right`).


Doesn't that mean the output should be shorter than the input, since each 128bit chunk produces one result?

The output has the same size as each input, since each 128-bit chunk of the input produces a 128-bit chunk for the output.

I guess I don't understand what is meant by performing an operation "horizontally".

Given two vectors, I can see how can one do an operation "pointwise" (that's what it would be called in mathematics, anyway): the i-th output element is computed as op(left[i], right[i]). Naturally, then the output is the same length as either input. Is that what you mean by "horizontally"? If not, what do you mean? I thought it may be a sort of fold, like summing all elements, but then you get one output element per input chunk, so that doesn't match the comment. I was trying to reverse engineer the code but failed.^^

Okay I stared at the code some more, and I think it's composing neighboring elements?

concatenate left ++ right to form input

then compute output[i] as op(input[2*i], input[2*i + 1])

Is that correct?

That's correct

src/shims/x86/mod.rs

src/shims/x86/avx.rs

src/shims/x86/mod.rs

RalfJung · 2024-02-14T18:06:29Z

(FYI, I will assume the ball is in your court until you do rustbot ready)

eduardosm · 2024-02-14T19:42:11Z

@rustbot ready

RalfJung · 2024-02-14T21:34:46Z

src/shims/x86/avx.rs

+                    let control = this.project_index(&control, i)?;
+
+                    // Each 128-bit lane is shuffled independently. Since each lane contains
+                    // two 64-bit elements, only the second bit from `right` is used (yes, the


right does not exist any more, should probably be control now?

RalfJung · 2024-02-14T21:35:38Z

src/shims/x86/avx.rs

+                    // second instead of the first, ask Intel). To read the value from the current
+                    // lane, add the destination index truncated to a multiple of 2.
+                    let src_i = ((this.read_scalar(&control)?.to_u64()? >> 1) & 1)
+                        .checked_add(i & !1)


Similar to above, i & !1 is the chunk_base, isn't it?

It is, since each element is 64 bits, and each chunk is 128 bits, a chunk has two elements, so we chop off the lowest bit to get the chunk base.

Okay, then let-bind it like above please. :)

RalfJung · 2024-02-14T21:35:56Z

src/shims/x86/avx.rs

+                for i in 0..dest_len {
+                    let control = this.project_index(&control, i)?;
+
+                    // Each 128-bit lane is shuffled independently. Since each lane contains


Should these lane be chunk?

Oops, I missed those

RalfJung · 2024-02-14T21:39:15Z

src/shims/x86/mod.rs

+///
+/// Each 128-bit chunk is treated independently (i.e., the value for
+/// the is i-th 128-bit chunk of `dest` is calculated with the i-th
+/// 128-bit blocks of `left` and `right`).


I guess I don't understand what is meant by performing an operation "horizontally".

Given two vectors, I can see how can one do an operation "pointwise" (that's what it would be called in mathematics, anyway): the i-th output element is computed as op(left[i], right[i]). Naturally, then the output is the same length as either input. Is that what you mean by "horizontally"? If not, what do you mean? I thought it may be a sort of fold, like summing all elements, but then you get one output element per input chunk, so that doesn't match the comment. I was trying to reverse engineer the code but failed.^^

RalfJung

@rustbot author

src/shims/x86/mod.rs

RalfJung · 2024-02-16T20:40:36Z

src/shims/x86/avx.rs

+                    this.copy_op(
+                        &this.project_index(&data, src_i)?,
+                        &this.project_index(&dest, i)?,
+                        /*allow_transmute*/ false,
+                    )?;


The signature of copy_op changed, this needs a rebase.

eduardosm · 2024-02-16T20:48:24Z

@rustbot ready

RalfJung · 2024-02-16T20:52:41Z

@bors r+

bors · 2024-02-16T20:52:43Z

📌 Commit 3cd30a5 has been approved by RalfJung

It is now in the queue for this repository.

bors · 2024-02-16T20:53:52Z

⌛ Testing commit 3cd30a5 with merge 454f054...

bors · 2024-02-16T22:22:57Z

☀️ Test successful - checks-actions
Approved by: RalfJung
Pushing 454f054 to master...

Move some x86 intrinsics code to helper functions in `shims::x86` To make them reusable for intrinsics of other x86 features. Splitted from rust-lang/miri#3192

eduardosm mentioned this pull request Dec 7, 2023

Move some x86 intrinsics code to helper functions in shims::x86 #3214

Merged

eduardosm marked this pull request as draft December 7, 2023 11:53

bors added a commit that referenced this pull request Dec 8, 2023

Auto merge of #3214 - eduardosm:move-x86-code, r=RalfJung

a5b9f54

Move some x86 intrinsics code to helper functions in `shims::x86` To make them reusable for intrinsics of other x86 features. Splitted from #3192

eduardosm marked this pull request as ready for review December 9, 2023 12:02

RalfJung reviewed Dec 21, 2023

View reviewed changes

src/shims/x86/avx.rs Show resolved Hide resolved

src/shims/x86/avx.rs Outdated Show resolved Hide resolved

src/shims/x86/avx.rs Outdated Show resolved Hide resolved

src/shims/x86/avx.rs Show resolved Hide resolved

src/shims/x86/avx.rs Outdated Show resolved Hide resolved

RalfJung reviewed Dec 21, 2023

View reviewed changes

tests/pass/intrinsics-x86-avx.rs Show resolved Hide resolved

RalfJung reviewed Dec 22, 2023

View reviewed changes

src/shims/x86/avx.rs Outdated Show resolved Hide resolved

rustbot added the S-waiting-on-author Status: Waiting for the PR author to address review comments label Dec 22, 2023

NamorNiradnug mentioned this pull request Jan 6, 2024

Basic features: Vec8f and Vec4f NamorNiradnug/vector-rust-library#1

Merged

rustbot added S-waiting-on-review Status: Waiting for a review to complete and removed S-waiting-on-author Status: Waiting for the PR author to address review comments labels Jan 12, 2024

RalfJung reviewed Jan 25, 2024

View reviewed changes

src/shims/x86/avx.rs Show resolved Hide resolved

src/shims/x86/avx.rs Show resolved Hide resolved

src/shims/x86/avx.rs Show resolved Hide resolved

RalfJung added S-waiting-on-author Status: Waiting for the PR author to address review comments and removed S-waiting-on-review Status: Waiting for a review to complete labels Jan 25, 2024

RalfJung reviewed Feb 5, 2024

View reviewed changes

src/shims/x86/mod.rs Outdated Show resolved Hide resolved

rustbot added S-waiting-on-review Status: Waiting for a review to complete and removed S-waiting-on-author Status: Waiting for the PR author to address review comments labels Feb 14, 2024

RalfJung reviewed Feb 14, 2024

View reviewed changes

rustbot added S-waiting-on-author Status: Waiting for the PR author to address review comments and removed S-waiting-on-review Status: Waiting for a review to complete labels Feb 14, 2024

RalfJung reviewed Feb 16, 2024

View reviewed changes

src/shims/x86/mod.rs Outdated Show resolved Hide resolved

RalfJung reviewed Feb 16, 2024

View reviewed changes

Implement x86 AVX intrinsics

3cd30a5

rustbot added S-waiting-on-review Status: Waiting for a review to complete and removed S-waiting-on-author Status: Waiting for the PR author to address review comments labels Feb 16, 2024

bors merged commit 454f054 into rust-lang:master Feb 16, 2024

eduardosm deleted the x86-avx-intrinsics branch February 18, 2024 00:20

Implement x86 AVX intrinsics #3192

Implement x86 AVX intrinsics #3192

Uh oh!

Conversation

eduardosm commented Nov 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RalfJung commented Dec 3, 2023

Uh oh!

workingjubilee commented Dec 7, 2023

Uh oh!

RalfJung commented Dec 7, 2023

Uh oh!

workingjubilee commented Dec 7, 2023

Uh oh!

eduardosm commented Dec 7, 2023

Uh oh!

bors commented Dec 8, 2023

Uh oh!

RalfJung left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RalfJung commented Dec 22, 2023

Uh oh!

eduardosm commented Jan 12, 2024

Uh oh!

RalfJung left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung Feb 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung Feb 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RalfJung commented Feb 14, 2024

Uh oh!

eduardosm commented Feb 14, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung Feb 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

eduardosm commented Nov 26, 2023 •

edited

Loading

RalfJung Feb 14, 2024 •

edited

Loading

RalfJung Feb 14, 2024 •

edited

Loading

RalfJung Feb 14, 2024 •

edited

Loading