Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added rayon feature #2

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

Julien-cpsn
Copy link

Run in parallel

You can have access to parallelized methods by enabling the rayon feature. Thanks to the rayon crate, the string slicing will execute through many threads.

Parallel methods:

  • par_slice
  • par_try_slice
  • par_substring
  • par_try_substring

Warning

Using the parallel methods on bigger strings is recommended. Parallelism scales greatly on bigger sizes.

Copy link
Owner

@staticintlucas staticintlucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for the PR!

I've left a few comments, sorry if I'm being a bit nitpicky.

Also please let me know if you have any performance data. I'm super curious to know how long a string would need to be to see any performance improvement.

@@ -11,6 +11,15 @@ keywords = ["string", "slice", "substring", "unicode", "utf-8"]
categories = ["no-std", "algorithms", "parsing", "rust-patterns", "text-processing"]

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
rayon_ = { package = "rayon", version = "1.10.0", optional = true }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask why you're using rayon_ rather than rayon here?

Copy link
Author

@Julien-cpsn Julien-cpsn Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because dependencies and features can conflict, so you can't name both rayon.

I personally think it's preferable for the user to use rayon and not rayon_

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[dependencies]
rayon = { ..., optional = true }

[features]
std = []
rayon = ["std"]

It should be possible to do something like this since Rust creates an implicit feature for optional dependencies.

In Rust 1.60+ you can also use the dep: syntax, but that requires bumping our MSRV


[features]
std = []
rayon = ["std"]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know rayon requires std, but do we also need a std feature in this stringslice?

As far as I am aware we should be able to keep stringslice as no_std even if rayon uses std.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the std feature for two reasons

  • so that the user is aware that adding rayon also adds std
  • it's already present if needed in the future

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sure, makes sense I guess.

In that case should we also make std a default feature?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, that's on you!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding std to default would be a breaking change, so let's not do that now.

If we need to add it in future I can do that in a 2.0 release.


> [!WARNING]
> Using the **par**allel methods on bigger strings is recommended. Parallelism scales greatly on bigger sizes.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any benchmark info you could add here? I think it would be useful to know how big a string needs to be for the par_* functions to be faster.

Copy link
Author

@Julien-cpsn Julien-cpsn Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to provide some, yes!

Comment on lines +50 to +52

#[cfg(feature = "std")]
extern crate std;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg(feature = "std")]
extern crate std;

This shouldn't be necessary

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First I did like #2 (comment)

But then I encountered some problems with tests, so that's how I figured it out. Maybe there's a better solution

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, for tests you need std.

You should do something like #[cfg_attr(not(any(feature = "std", test)), no_std)] to enable std for tests

Comment on lines +45 to +49
#![cfg(not(feature = "std"))]
use core::ops::{Bound, RangeBounds};

#[cfg(feature = "std")]
use std::ops::{Bound, RangeBounds};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#![cfg(not(feature = "std"))]
use core::ops::{Bound, RangeBounds};
#[cfg(feature = "std")]
use std::ops::{Bound, RangeBounds};
use core::ops::{Bound, RangeBounds};

You should be able to use core::* even with std

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rust docs say that it's better to use std instead of core, but it doesn't elaborate on that

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a compromise you should be able to do something like this to avoid duplicate imports:

#[cfg(not(feature = "std"))
use core as std;

use std::whatever;

src/lib.rs Show resolved Hide resolved
src/lib.rs Show resolved Hide resolved
src/lib.rs Show resolved Hide resolved
Comment on lines +139 to +205
#[cfg(feature = "rayon")]
/// Returns a string slice for the given range of characters
///
/// This method will panic if the range is invalid,
/// for example if the beginning is greater than the end.
///
/// Runs in parallel
///
/// # Examples
/// ```
/// use stringslice::StringSlice;
///
/// assert_eq!("Ùníc😎de".slice(4..5), "😎");
/// ```
fn par_slice(&self, range: impl RangeBounds<usize>) -> &str;

#[cfg(feature = "rayon")]
/// Returns an [`Option`] containing string slice for the given range of characters
///
/// This method will return [`None`] if the range is invalid,
/// for example if the beginning is greater than the end.
///
/// Runs in parallel
///
/// # Examples
/// ```
/// use stringslice::StringSlice;
///
/// assert_eq!("Ùníc😎de".try_slice(4..5), Some("😎"));
/// ```
/// [`Option`]: https://doc.rust-lang.org/std/option/enum.Option.html
/// [`None`]: https://doc.rust-lang.org/std/option/enum.Option.html#variant.None
fn par_try_slice(&self, range: impl RangeBounds<usize>) -> Option<&str>;

#[cfg(feature = "rayon")]
/// Returns a string slice between the given beginning and end characters
///
/// This method will panic if the parameters are invalid,
/// for example if the beginning is greater than the end.
///
/// Runs in parallel
///
/// # Examples
/// ```
/// use stringslice::StringSlice;
///
/// assert_eq!("Ùníc😎de".substring(4, 5), "😎");
/// ```
fn par_substring(&self, begin: usize, end: usize) -> &str;

#[cfg(feature = "rayon")]
/// Returns an [`Option`] containing string slice between the given beginning and end characters
///
/// This method will return [`None`] if the parameters are invalid,
/// for example if the beginning is greater than the end.
///
/// Runs in parallel
///
/// # Examples
/// ```
/// use stringslice::StringSlice;
///
/// assert_eq!("Ùníc😎de".try_substring(4, 5), Some("😎"));
/// ```
/// [`Option`]: https://doc.rust-lang.org/std/option/enum.Option.html
/// [`None`]: https://doc.rust-lang.org/std/option/enum.Option.html#variant.None
fn par_try_substring(&self, begin: usize, end: usize) -> Option<&str>;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than adding these methods to the existing StringSlice trait, do you think it would be cleaner to create a new ParStringSlice trait for these?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be great yes

src/lib.rs Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants