Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experiment] Use coresimd, runtime feature detection and a faster, shorter algorithm #37

Closed
wants to merge 2 commits into from

Conversation

Veedrac
Copy link
Collaborator

@Veedrac Veedrac commented Jan 31, 2018

I tried writing this with coresimd. Turns out we can do a lot better;

  • it's generally faster,
  • it's much faster than naïve for all arrays ≥32 long, almost equal below,
  • it vectorizes on any vectorizable platform, with no need for feature flags,
  • it has runtime feature detection for sse2 for fast path summations,
  • uses AVX automatically if the target supports it, and probably supports AVX512.

Note that coresimd isn't settled and this doesn't build on stable, so it needs some work to become an actual competitor. I also think the target_feature stuff is suboptimal, since it's not applied in places where it could be, and we'd probably be better off if we detected AVX at runtime as well.

Thanks /u/vvimpcrvsh for writing up a trick that I used here and that encouraged me to bother with this experiment.

@AdamNiederer
Copy link

Thanks /u/vvimpcrvsh for writing up a trick that I used here and that encouraged me to bother with this experiment.

You're welcome!

# faster
test bench_count_big_1000000_hyper ... bench:      18,983 ns/iter (+/- 749)
# sad
test bench_count_big_1000000_hyper     ... bench:      17,265 ns/iter (+/- 15)

It appears that we've successfully raced to the bottom - impressive work!

@llogiq
Copy link
Owner

llogiq commented Jan 31, 2018

Why not add this as an experimental feature that overrides the others when present, say core-simd? We could mark it as experimental and likely to break in the docs and have people try it already.

@BurntSushi
Copy link

I would definitely use this. It would motivate me to switch regex over to coresimd as well, and then I could fix bugs like this. :-) BurntSushi/ripgrep#135

One thing to be cautious of though is that coresimd is still evolving, so switching to it will likely increase the maintenance burden until it settles down.

@AdamNiederer
Copy link

Yeah, coresimd 0.0.4 has no effortless upgrade path at the moment, and uses some deprecated features. Hopefully we'll have that resolved by the end of the year - it looks like all you'll need to do is change some type definitions and function calls.

@Veedrac
Copy link
Collaborator Author

Veedrac commented Feb 1, 2018

For detailed performance numbers, see the charts. The graphs are on the SIMD and AVX tabs. Note that the SIMD variant was used without target-cpu=native, versus the old version which included it, so it's pretty cool it's still way faster.

If @BurntSushi's going to use this, I think it's worthwhile doing. My main concern is just having something for the stable channel.

@llogiq
Copy link
Owner

llogiq commented Feb 1, 2018

We can still use the old bit-twiddling algorithms on anything but nightly. Then it's a matter of using a feature, as we've done all the time.

I think we even might want to keep the simd using code as a backup should coresimd fail us, so we have a quick workaround when that happens.

@Veedrac
Copy link
Collaborator Author

Veedrac commented Feb 1, 2018

I am not particularly eager to simultaneously support simd and coresimd. People who want more stability and can't live with stable bit twiddling could just lock themselves to 0.3.1.

@llogiq
Copy link
Owner

llogiq commented Feb 1, 2018

You're right. simd has no future, coresimd will be the way forward. So let's add a plain scalar version for stable & beta and try to follow coresimd. My hope is that we all can help drive the ecosystem forward and get stable coresimd before the year ends.

@AdamNiederer
Copy link

AdamNiederer commented Feb 1, 2018

I've been meaning to speak with the stdsimd team about getting 0.0.5 out the door now that the API changes have landed. Y'all might want to consider rebasing on that version, as it's much more congruent with what will actually be stabilized. 0.0.4 will also silently stop compiling in the future.

Also, it may be worth looking into potential performance difference on non-x86 targets.

@llogiq
Copy link
Owner

llogiq commented Feb 2, 2018

I have a few ARMs here; nothing fancy, just an Android phone and a RasPii 1&2.

@BurntSushi
Copy link

std::arch is now a thing on Rust nightly: https://internals.rust-lang.org/t/simd-now-available-in-libstd-on-nightly/6903

I hope to convert the regex library over to this soon. This means dropping the simd crate entirely and removing all related features (or rather, making them no-ops to preserve backward compatibility if you don't want to do a semver bump).

@Veedrac
Copy link
Collaborator Author

Veedrac commented Mar 4, 2018

@BurntSushi I might prefer to wait for the platform independent variants to appear, depending on how long they will take, before tearing out the current support. It is my understanding that those are planned to be added to Rust directly. A short interim period where you use the SIMD-less fallback wouldn't bother me much TBH, as long as it stays short. I suppose you have a much better intuition about timescales than I do.

If you'd rather push forward with more immediacy, I guess that is fine too. I would probably only bother supporting an x86 fast-path directly, though.

@BurntSushi
Copy link

BurntSushi commented Mar 4, 2018

I do believe the current approximate plan is to bring in the portable API to std and hopefully stabilize both of them together at the same time. However, an RFC for the portable API hasn't appeared yet, which means a timeline is somewhat hard to predict. I do feel like a lot of the stakeholders (thus far) are all in sync though, so hopefully it won't be too bad!

But yeah, I don't know when exactly I'll work on porting the regex crate. It is pretty hairy stuff and will likely take me some time. When that drops, I'll basically just move forward with shipping portable binaries, but I'll leave the simd-accel and avx-accel features on ripgrep until whatever time y'all move forward with runtime target feature detection. So no rush!

I would probably only bother supporting an x86 fast-path directly, though.

Yes, that is my plan as well. For my uses on text, a portable API is insufficient anyway, so I'd have to write different algorithms for each platform depending on what intrinsics are available to me. I don't personally plan on doing that for anything but x86_64.

@BurntSushi
Copy link

@Veedrac I forgot that the portable API is actually part of nightly too: https://doc.rust-lang.org/nightly/std/simd/ --- Does that change your calculus at all? (The portable API hasn't gone through an RFC yet, so it may be subject to revision more so than the vendor dependent APIs, but it is very similar to what the simd crate provides today.)

@Veedrac
Copy link
Collaborator Author

Veedrac commented Mar 5, 2018

Thanks for pointing that out, that seems like at least enough reason to port this patch and get a foot in the door.

@Veedrac
Copy link
Collaborator Author

Veedrac commented Mar 10, 2018

is_target_feature_detected isn't nostd. Not sure how to go from there.

@BurntSushi
Copy link

No sure either. I guess the first question to ask is whether you're OK with having an optional dependency on std?

@Veedrac
Copy link
Collaborator Author

Veedrac commented Mar 10, 2018

I'd hesitate to pessimise our nostd users.

@llogiq
Copy link
Owner

llogiq commented Mar 10, 2018

Is there a reason for leaving nostd users in the cold?

@Veedrac
Copy link
Collaborator Author

Veedrac commented Mar 11, 2018

@llogiq Are you asking why is_target_feature_detected isn't in core (I don't know, I assume it is an oversight) or why we need is_target_feature_detected (it replaces cfg_feature_enabled)?

@BurntSushi
Copy link

@Veedrac See: https://github.com/rust-lang/rfcs/blob/master/text/2325-stable-simd.md#the-is_target_feature_detected-macro --- TL;DR it wasn't an oversight, and while it is possible to do, we took the conservative route for now.

I think the thing I'm after is whether you're willing to support a std-only mode, which will be required for runtime target feature detection for the foreseeable future. My feeling is that the actual code required to do so would be quite small, but I don't know for sure!

@Veedrac
Copy link
Collaborator Author

Veedrac commented Mar 11, 2018

Thanks for the link. Doesn't that justification weaken now that we're going with is_x86_feature_detected and such as per rust-lang/stdarch#346?

Supporting a std-only mode would be easy, it just seems a little unfair that core users would lose out, so I wanted to check what other options we have first.

@BurntSushi
Copy link

@Veedrac Possibly, yes. I'm not sure what you mean by unfair though. It's unfair from the context of what Rust supports, sure, maybe. But bytecount can only do what rustc provides. Core users lose out regardless of whether you support a std-only feature. :-)

@Veedrac
Copy link
Collaborator Author

Veedrac commented Mar 11, 2018

I meant in comparison to the coresimd library, which supports runtime feature detection on core.

@llogiq
Copy link
Owner

llogiq commented Mar 20, 2018

Can't we do both? The interface is pretty similar, and I think we should be able to create an abstraction over both coresimd and std::arch, dispatched by a compile-time feature. Embedded users are already on nightly AFAIK, so they wouldn't lose anything, and once std::arch is stable, we'd have stable SIMD in bytecount.

The downside is of course that the underlying implementation may change, and we'd have to update accordingly. I think that would be worth it.

@Veedrac
Copy link
Collaborator Author

Veedrac commented Apr 1, 2018

The newest patch splits the code up into different folders so it's less of a single congealed mess. I've got two features; simd uses stdsimd and target_feature enables runtime feature detection but disables no_std.

I think using the coresimd library in Cargo for runtime feature detection with no_std should be doable, but we'd be supporting a rather large cartesian product of use-cases. At least I think one could still happily use std::arch and only use coresimd for feature detection, so we wouldn't have to worry too much about divergence.

As it stands I think I'm just going to add some fake vector support for stable users, run a few tests and call it a day.

E: Done.

@Veedrac Veedrac force-pushed the master branch 2 times, most recently from bb1354c to f93b6a6 Compare April 2, 2018 19:00
@mati865
Copy link

mati865 commented Jun 28, 2018

Are there plans to revive this PR now when Rust 1.27 with SIMD was released?

@Veedrac
Copy link
Collaborator Author

Veedrac commented Jun 28, 2018

@mati865 Yep, just been distracted by work and other interests. Only issue is I don't think @BurntSushi tracks the newest stable, and he's our major customer.

@BurntSushi
Copy link

@Veedrac Right, yeah. If you increase the minimum Rust version to 1.27 that would be fine, but I probably wouldn't upgrade to it for at least a few release cycles. The only other alternative is to provide automatic and optional support for 1.27 but still work on older Rust releases (which probably either implies a performance regression or more work maintaining alternative fast implementations or maybe even both, I don't know). Up to y'all! I can be patient.

@llogiq
Copy link
Owner

llogiq commented Jun 28, 2018

I've locally rebased this to bring in criterion benchmarks. We lose against the naive algorithm until ~12 bytes on my machine, at which point the hyper versions become consistently faster.

@ignatenkobrain
Copy link
Contributor

Ju wanted to point out somewhere. We (Fedora), love ripgrep and we can't use its full power because we use stable compiler (so no "simd" crate). It would be nice if we could move this forward.

@BurntSushi
Copy link

@ignatenkobrain As I understand it, making progress on this soon might entail a pretty big maintenance burden that we shouldn't expect folks to entertain. We might just need to wait until ripgrep can make Rust 1.27 its minimum version (which, realistically, is maybe ~6 months from now).

A possible alternative path is to cut a bytecount 0.4 that bumps its MSRV to 1.27. ripgrep will hold off on upgrading until Rust 1.27 is a suitable minimum version, but you could patch ripgrep's Cargo.toml to include the upgrade assuming Fedora is bundling Rust 1.27.

@ignatenkobrain
Copy link
Contributor

Fedora has 1.27.2

@Veedrac
Copy link
Collaborator Author

Veedrac commented Jul 29, 2018

@ignatenkobrain If you want to run a patched ripgrep I'll bump this up in priority.

@ignatenkobrain
Copy link
Contributor

@Veedrac that would be nice. I can patch fedora's ripgrep to use this.

@BurntSushi
Copy link

In light of BurntSushi/ripgrep#1019, I switched ripgrep's minimum supported Rust version to "latest" stable, so feel free to move forward with this whenever. :-)

@Veedrac
Copy link
Collaborator Author

Veedrac commented Aug 23, 2018

@BurntSushi Exciting. I'll have a patch out this week, since I finally have a free weekend.

@Veedrac
Copy link
Collaborator Author

Veedrac commented Aug 26, 2018

A new try for stable is going on here: #44. Closing this PR.

@Veedrac Veedrac closed this Aug 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants