Skip to content

Implement core::ops #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Oct 2, 2020
Merged

Implement core::ops #10

merged 16 commits into from
Oct 2, 2020

Conversation

calebzulawski
Copy link
Member

Closes #5.

Note that none of this is tested--I'm not actually sure where to start with testing. @workingjubilee any recommendations from packed-simd?

Also note that I didn't implement bitwise ops for floats. We can discuss it, but a solution that mirrors f32 and f64 are to_bits and from_bits fns that use u32 and u64 vectors.

@Lokathor
Copy link
Contributor

Lokathor commented Sep 28, 2020

I think that this is effectively blocked on having CI basics in place to run the tests. Maybe we can have that be a subject at today's meeting.

The tests themselves can be simple math tests in a tests/ folder like any normal integration test suite. Example inputs, expected output, assert that the actual output is equal to (or within a tolerance of) the actual output.

For the specific organization of this sort of "many small code bits" crate, I usually make one tests/ file per src/ file, and then each test name is generally one of impl_[trait]_for_[type], type_[method], or test_[property].

I'd be happy to just donate the entire test suite from wide to this crate. Other than possible renames here or there, I expect that stdsimd should be able to pass any test that wide does. It doesn't cover all the stuff that we'd want since it's focused on actual hardware ops, so there are no tests for integer simd division for example, but it's a good start probably.

@workingjubilee workingjubilee added the I-nominated We should discuss this at the next weekly meeting label Sep 28, 2020
@workingjubilee
Copy link
Member

workingjubilee commented Sep 28, 2020

We will eventually want to build up to running most of the same set of example projects that packed_simd did, as well. They look a lot like real programs and will provide useful full-scale integration testing and benchmarks, essentially.

Lower things will be useful for revealing analytical diagnostics, and there I am mostly indifferent as long as the tests are easily findable. packed_simd had macroed tests as part of its impls, and there I often found it difficult to figure out where the test and the implementation code diverged.

otherwise: What Lokathor said. Copy-pasta from wide sounds delicious.

@Lokathor
Copy link
Contributor

Lokathor commented Sep 28, 2020

Heavy macros in the tests sounds pretty gross.

However, a macro that can take in the inputs+outputs for the widest possible value we want to test (say: f32x16), and then automatically generate the same test test at lower lane counts (eg: f32x8, f32x4, f32x2) by just skipping the values at the end of the list, or even by running more than one assert cycle, well that does sound extremely useful.

@workingjubilee workingjubilee removed the I-nominated We should discuss this at the next weekly meeting label Sep 28, 2020
@workingjubilee
Copy link
Member

This PR is waiting on #3 to go first.

@calebzulawski calebzulawski marked this pull request as draft September 28, 2020 22:34
@calebzulawski
Copy link
Member Author

I'll take that opportunity to clean up a couple things.

@KodrAus
Copy link
Contributor

KodrAus commented Sep 29, 2020

Alrighty, after a rebase this should pick up our initial CI

@calebzulawski calebzulawski marked this pull request as ready for review September 29, 2020 02:20
Copy link
Contributor

@Lokathor Lokathor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of the vec/scalar ops should be commutative when the op is.

that is, if you can do f32x4 + f32 -> f32x4, you should probably also be able to do f32 + f32x4 -> f32x4.

Obviously this would only apply to Op, not OpAssign variants.

@Lokathor
Copy link
Contributor

Lokathor commented Sep 29, 2020

All set to merge?

EDIT: oh, right, tests, ignore me XD

@calebzulawski
Copy link
Member Author

I just pushed some floating point tests (which uncovered that Neg can't be implemented with 0 - val, since it doesn't work for -0). If we're okay with this test style I'll expand it to the other types as well.

@programmerjake
Copy link
Member

-0.0 - val works for neg assuming LLVM's default rounding mode (round to nearest, ties to even).

@workingjubilee
Copy link
Member

I am not sure how I feel about the f32x4 + f32 example. I understand the intent behind including the operation but I am not sure if that is the best API for it, as my internal sense of what that would express actually is non-commutative:

f32x4 + f32 // -> f32x4, the rhs f32 added to each lane of lhs
f32 + f32x4 // -> f32, lhs added to the summation of rhs

And I feel the rhs vector is more ambiguous in general. And I must note, in general Rust is very conservative with extending Add impls even in "obvious" cases, e.g. compare i32 + i64. Of course you could have it return an i64, always, but

error[E0277]: cannot add `i32` to `i64`
 --> src/main.rs:2:18
  |
2 |     let n = 4i64 + -3i32;
  |                  ^ no implementation for `i64 + i32`
  |
  = help: the trait `std::ops::Add<i32>` is not implemented for `i64`

I don't believe packed_simd implemented Add for scalar + vector, but that might have been a technical limitation and not on purpose? In any case, I feel like having a fn to widen a scalar into an appropriate vector would be better than implementing coercion, and I believe with appropriate const generics madness it would be possible to have it handled by type inference(!) so as to make it as ergonomic as possible.

@calebzulawski
Copy link
Member Author

Personally, I'm fine with removing the mixed scalar/vector ops. a + f32x4::splat(b) or simply a + b.into() isn't too bad in my opinion and might be less likely to be confusing.

@Lokathor
Copy link
Contributor

Lokathor commented Oct 1, 2020

Does a + b.into() infer properly?

I'll say that auto-splatting ops impls are one of the things I was almost immediately asked about adding when I released wide-0.5.0 without them. Code can really quickly get noisey without the auto-splatting. Particularly, if you have a scalar function, it's easier to convert to being vector if the literals will automatically promote themselves.

Also, C++ libs generally do have this, and that's another language and all, but it does mean that people using SIMD are generally comfortable with it.

@calebzulawski
Copy link
Member Author

Good point, I just tried it and using Into doesn't infer correctly. I'm a little less sold since that doesn't work. You could always implement a function/trait on scalars to splat them, like a + b.splat() but I'm not sure that's any better.

@workingjubilee
Copy link
Member

...is there a For/Into impl here for that?

C++ and C implement some pretty intense mathematical coercion by default and Rust deliberately chose another path. I would like at least to experiment with an alternative API here because I think there is a fiendish trick I can make work.

@Lokathor
Copy link
Contributor

Lokathor commented Oct 1, 2020

What makes Rust's minimal coercion work out well is that you can still quite casually convert numbers by throwing as _ onto anything. Since that's not available with simd types, things get wordy fast.

Feel free to continue to try and think of good alternatives, but literals particularly are usually a pain point like this.

@calebzulawski
Copy link
Member Author

...is there a For/Into impl here for that?

C++ and C implement some pretty intense mathematical coercion by default and Rust deliberately chose another path. I would like at least to experiment with an alternative API here because I think there is a fiendish trick I can make work.

All of the vectors implement From<Scalar>. I guess it's getting caught up on the generic Rhs type.

I'm also not sure that it's really considered coersion--while the scalar gets splatted as far as the actual implementation goes, conceptually it can be thought of without any splat at all. I'm also not sure that behavior is unique to C++ libs, MATLAB and Wolfram automatically broadcast scalars for elementwise ops.

@calebzulawski
Copy link
Member Author

Unrelated to the scalar-vector ops, are we ok with tests in this form? Should I expand them to the rest of the types?

@Lokathor
Copy link
Contributor

Lokathor commented Oct 1, 2020

It's... quite full of macros, but seems to still make sense.

Also, our sample data probably needs a little more flair than just small positive whole numbers. As soon as we start mixing it up a bit, we'll probably need an ApproxEq as well as just BitEq.

@workingjubilee
Copy link
Member

workingjubilee commented Oct 1, 2020

We will always want to test:
desired interactions with ty::MAX and ty::MIN
desired interactions with 1 and -1
desired interactions with 0 as ty
Because these are both exceedingly simple and also exceedingly revealing for some edge cases.

@Lokathor
Copy link
Contributor

Lokathor commented Oct 1, 2020

also for floats:

  • nan
  • very large values
  • very small values
  • infinity, neg infinity

@workingjubilee
Copy link
Member

I think for very large and small values we can use a crate like quickcheck to test our assumptions there, and we can defer those until later. NaN and the infinities are important also though, but we're going to have a hard time testing NaN due to LLVM's canonicalizing behavior with them.

@Lokathor
Copy link
Contributor

Lokathor commented Oct 1, 2020

Well theoretically all the NaN are equally "you'll get nonsense", we shouldn't have to worry about specific nan bit patterns. That would itself be non-portable i think.

Also, it seems like we've got 16 lanes or so of test data space, and we can run the op a second time to get a second set of 16 cases, so there's no big harm in having some preset "weird input" cases. We can also have quickcheck of course.

@calebzulawski
Copy link
Member Author

Ok, didn't implement any edge cases (and I definitely like the idea of something like quickcheck) but everything should work, nominally.

@workingjubilee
Copy link
Member

I think we can TODO more tests then and rebase-and-merge this on green to unblock parallelized work on vector math and more tests, then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

impl core::ops for core::simd
5 participants