-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added rayon feature #2
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for the PR!
I've left a few comments, sorry if I'm being a bit nitpicky.
Also please let me know if you have any performance data. I'm super curious to know how long a string would need to be to see any performance improvement.
@@ -11,6 +11,15 @@ keywords = ["string", "slice", "substring", "unicode", "utf-8"] | |||
categories = ["no-std", "algorithms", "parsing", "rust-patterns", "text-processing"] | |||
|
|||
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html | |||
[dependencies] | |||
rayon_ = { package = "rayon", version = "1.10.0", optional = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I ask why you're using rayon_
rather than rayon
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because dependencies and features can conflict, so you can't name both rayon.
I personally think it's preferable for the user to use rayon
and not rayon_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[dependencies]
rayon = { ..., optional = true }
[features]
std = []
rayon = ["std"]
It should be possible to do something like this since Rust creates an implicit feature for optional dependencies.
In Rust 1.60+ you can also use the dep:
syntax, but that requires bumping our MSRV
|
||
[features] | ||
std = [] | ||
rayon = ["std"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know rayon
requires std
, but do we also need a std
feature in this stringslice
?
As far as I am aware we should be able to keep stringslice
as no_std
even if rayon
uses std
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the std feature for two reasons
- so that the user is aware that adding rayon also adds std
- it's already present if needed in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sure, makes sense I guess.
In that case should we also make std
a default feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, that's on you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding std
to default
would be a breaking change, so let's not do that now.
If we need to add it in future I can do that in a 2.0
release.
|
||
> [!WARNING] | ||
> Using the **par**allel methods on bigger strings is recommended. Parallelism scales greatly on bigger sizes. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any benchmark info you could add here? I think it would be useful to know how big a string needs to be for the par_*
functions to be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to provide some, yes!
|
||
#[cfg(feature = "std")] | ||
extern crate std; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[cfg(feature = "std")] | |
extern crate std; |
This shouldn't be necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First I did like #2 (comment)
But then I encountered some problems with tests, so that's how I figured it out. Maybe there's a better solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, for tests you need std
.
You should do something like #[cfg_attr(not(any(feature = "std", test)), no_std)]
to enable std
for tests
#![cfg(not(feature = "std"))] | ||
use core::ops::{Bound, RangeBounds}; | ||
|
||
#[cfg(feature = "std")] | ||
use std::ops::{Bound, RangeBounds}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#![cfg(not(feature = "std"))] | |
use core::ops::{Bound, RangeBounds}; | |
#[cfg(feature = "std")] | |
use std::ops::{Bound, RangeBounds}; | |
use core::ops::{Bound, RangeBounds}; |
You should be able to use core::*
even with std
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rust docs say that it's better to use std instead of core, but it doesn't elaborate on that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a compromise you should be able to do something like this to avoid duplicate imports:
#[cfg(not(feature = "std"))
use core as std;
use std::whatever;
#[cfg(feature = "rayon")] | ||
/// Returns a string slice for the given range of characters | ||
/// | ||
/// This method will panic if the range is invalid, | ||
/// for example if the beginning is greater than the end. | ||
/// | ||
/// Runs in parallel | ||
/// | ||
/// # Examples | ||
/// ``` | ||
/// use stringslice::StringSlice; | ||
/// | ||
/// assert_eq!("Ùníc😎de".slice(4..5), "😎"); | ||
/// ``` | ||
fn par_slice(&self, range: impl RangeBounds<usize>) -> &str; | ||
|
||
#[cfg(feature = "rayon")] | ||
/// Returns an [`Option`] containing string slice for the given range of characters | ||
/// | ||
/// This method will return [`None`] if the range is invalid, | ||
/// for example if the beginning is greater than the end. | ||
/// | ||
/// Runs in parallel | ||
/// | ||
/// # Examples | ||
/// ``` | ||
/// use stringslice::StringSlice; | ||
/// | ||
/// assert_eq!("Ùníc😎de".try_slice(4..5), Some("😎")); | ||
/// ``` | ||
/// [`Option`]: https://doc.rust-lang.org/std/option/enum.Option.html | ||
/// [`None`]: https://doc.rust-lang.org/std/option/enum.Option.html#variant.None | ||
fn par_try_slice(&self, range: impl RangeBounds<usize>) -> Option<&str>; | ||
|
||
#[cfg(feature = "rayon")] | ||
/// Returns a string slice between the given beginning and end characters | ||
/// | ||
/// This method will panic if the parameters are invalid, | ||
/// for example if the beginning is greater than the end. | ||
/// | ||
/// Runs in parallel | ||
/// | ||
/// # Examples | ||
/// ``` | ||
/// use stringslice::StringSlice; | ||
/// | ||
/// assert_eq!("Ùníc😎de".substring(4, 5), "😎"); | ||
/// ``` | ||
fn par_substring(&self, begin: usize, end: usize) -> &str; | ||
|
||
#[cfg(feature = "rayon")] | ||
/// Returns an [`Option`] containing string slice between the given beginning and end characters | ||
/// | ||
/// This method will return [`None`] if the parameters are invalid, | ||
/// for example if the beginning is greater than the end. | ||
/// | ||
/// Runs in parallel | ||
/// | ||
/// # Examples | ||
/// ``` | ||
/// use stringslice::StringSlice; | ||
/// | ||
/// assert_eq!("Ùníc😎de".try_substring(4, 5), Some("😎")); | ||
/// ``` | ||
/// [`Option`]: https://doc.rust-lang.org/std/option/enum.Option.html | ||
/// [`None`]: https://doc.rust-lang.org/std/option/enum.Option.html#variant.None | ||
fn par_try_substring(&self, begin: usize, end: usize) -> Option<&str>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than adding these methods to the existing StringSlice
trait, do you think it would be cleaner to create a new ParStringSlice
trait for these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be great yes
Run in parallel
You can have access to parallelized methods by enabling the
rayon
feature. Thanks to the rayon crate, the string slicing will execute through many threads.Parallel methods:
par_slice
par_try_slice
par_substring
par_try_substring
Warning
Using the parallel methods on bigger strings is recommended. Parallelism scales greatly on bigger sizes.