Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Needle API (née Pattern API) #2500

Merged
merged 10 commits into from
Nov 29, 2018
Merged

Conversation

kennytm
Copy link
Member

@kennytm kennytm commented Jul 14, 2018

@Centril Centril added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Jul 15, 2018
@Centril
Copy link
Contributor

Centril commented Jul 15, 2018

🎉

Happy to see you got some use out of my link to https://crates.io/crates/galil-seiferas and cc @bluss who's the author of that crate.

@gilescope
Copy link

SharedHaystack seems like a general concept. Could we call it CheapClone or something like that - I'm sure there's lots of other places we'd like to know if a clone is expensive or not.

@kennytm
Copy link
Member Author

kennytm commented Jul 18, 2018

@gilescope Interesting, though if we generalize the concept it would raise the question what is meant by "cheap", e.g. is the Clone of u32 cheap? [u32; 1]? [u32; 65536]? Box<Rc<[u32]>>?

Suppose we do introduce the ShallowClone marker trait:

pub trait ShallowClone: Clone {}
impl<'a, T: ?Sized + 'a> ShallowClone for &'a T {}
impl<T: ?Sized> ShallowClone for Rc<T> {}
impl<T: ?Sized> ShallowClone for Arc<T> {}

the SharedHaystack bound could be changed to a trait alias and everything should work...

#[deprecated]
trait SharedHaystack = Haystack + ShallowClone;

... as long as SharedHaystack is still unstable. If it has become stable then we could not change anything (a third party crate could impl SharedHaystack for MyRef<MyHay> with MyRef<T>: !ShallowClone). So if we intend to take ShallowClone seriously we should have separate stabilization tracks between the whole Pattern API and SharedHaystack.

Anyway ShallowClone should belong to another RFC. I've added this to unresolved questions.


### Consumer

A consumer provides the `.consume()` method to implement `starts_with()` and `trim_start()`. It
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to implement starts_with()

Is that missing from this or outdated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the starts_with algorithm, not the trait method :)

@shepmaster
Copy link
Member

It's the most 🚲🏚 thing, but I don't find Hay evocative. I think that it's supposed to be "a haystack that's bigger than the haystack we are currently looking at", so maybe something like HayField?

@shepmaster
Copy link
Member

shepmaster commented Jul 19, 2018

Experience report — implementing Pattern for searching &str

See my branch (permalink).

Background

Jetscii is a library that implements "find the first of any of these 16 bytes in a string". Its stable interface is fn find(&self, haystack: &str) -> Option<usize>. It has a feature flag allowing it to implement the current unstable Pattern API.

Thoughts

  • There are a lot of types and traits that fit together in subtle ways
  • Searcher and Consumer feel like they will be redundant, based on their descriptions
  • It is unclear what Searcher and Consumer map to in algorithm-function-space. This is important to know in order to test the implementations.
  • I did not think of my pattern in the terms of what the Searcher / Consumer primitives offer
    • Implementing Consumer feels very non-performant for my case
  • From my implementation, it's unclear why Searcher and Consumer are different traits. It's also unclear why Pattern is itself a different trait. For me, these were all implemented by a single type, the type I already had. I'm sure there's a reason for the indirection, but justifying that upfront in the docs will be highly useful.

Implementation failures

I implemented Pattern and friends and it passed all my tests. I then had @kennytm take a look at it and we identified multiple issues with my implementation:

  • I didn't properly return the offset in the hay, but only in the haystack. While documented, this is a subtle interaction that doesn't break anything until you start using the second result of find (or one of the iterators).
  • My implementation of consume was completely broken, but was not tested at all. As mentioned above, I don't think in terms of these two primitives, so I didn't test an algorithm that used consume at first.

Looking forward

One exciting thing is that this allows implementing a pattern for &[T], which is what Jetscii actually does. I plan on trying that and will post another comment for that.

@kennytm
Copy link
Member Author

kennytm commented Jul 19, 2018

Thanks @shepmaster! I'll update the descriptions in the pattern_3 crate for the unaddressed points (why Searcher and Consumer being different traits are explained in the RFC but not in the crate documentations).

There are a lot of types and traits that fit together in subtle ways

Could you elaborate what do you mean by "subtle"?

Implementing Consumer feels very non-performant for my case

Yes this problem also exists for my regex implementation which does an unanchored search.

unsafe impl<'p> Consumer<str> for RegexSearcher<'p> {
    fn consume(&mut self, span: Span<&str>) -> Option<usize> {
        let (hay, range) = span.into_parts();
        let m = self.regex.find_at(hay, range.start)?;
        if m.start() == range.start {
            Some(m.end())
        } else {
            None
        }
    }
}

I don't know if this could be improved performance-wise other than asking the Searcher implementor to provide such primitive too.

(If we disregard the performance issue we could default impl a Consumer in terms of a Searcher and vice-versa, but most of the time this isn't a good idea.)

@shepmaster
Copy link
Member

Experience report — implementing Pattern for searching &[u8]

See my branch (permalink).

Background

Previous comment

Thoughts

  • I'm very happy that Span::as_bytes exists. I expected there to be a more general map though.
  • For my particular case, the &[u8] implementation is simpler because that's my core algorithm. My &str Pattern can actually delegate to the &[u8] one.

Looking forward

I've needed to write custom consumers of a Pattern before as well; I hope that to be the next comment.

Explained why Pattern cannot be merged into Searcher.

Block on RFC 1672.
@kennytm
Copy link
Member Author

kennytm commented Jul 24, 2018

Update

Addressing #2500 (comment).

  1. Merged Searcher and Consumer into a single trait, to reduce the number of types. The concept of "consumer" still exists where the Searcher impl can be an enum to choose between a search-optimized or consume-optimized structure. Microbenchmarks shows that this runtime selection doesn't incur much slowdown.

  2. Added more documentation about Searcher and Pattern into the pattern-3 crate. Unfortunately not available in docs.rs yet until they have fixed that #![feature(extern_prelude)] bug 😛

  3. Changed some Pattern impl not blocked by Disjointness based on associated types. #1672 to use a "blanket" impl.


I'm very happy that Span::as_bytes exists. I expected there to be a more general map though.

A general map cannot safely exist, as you could write span.map(|h| "") and that produced nonsense.

An unsafe version can be done as

let (hay, range) = span.into_parts();
let hay = hay.as_inner();
unsafe { Span::from_parts(hay, range) }

Copy link

@gereeter gereeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a generally great RFC and I'm overall quite happy with the API and definitely happy with the demonstrated performance improvements.

I would be interested in seeing more detail and benchmarks in regards to the behaviour with owned haystacks. From what I can tell, this makes split on Vec take quadratic time, since the trisection needs to copy the entire tail of the Vec on every split. This seems like a hard problem, and it would be bad to back ourselves into a corner.

One (verbose) solution would be to introduce an intermediate data structure, a PartialVec<T> that owns the memory of a whole Vec but only owns a small range worth of elements. It would be possible to convert it into a Vec by shifting the elements to the start. Then, split_around would be turned into two variants (where PartialVec would presumably be specified in another associated type), split_around_forward(...) -> (Vec, Vec, PartialVec) and split_around_backward(...) -> (PartialVec, Vec, Vec). split would use split_around_forward and rsplit would use split_around_backward.

A hay can *borrowed* from a haystack.

```rust
pub trait Haystack: Deref<Target: Hay> + Sized {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that, due to its unsafe methods being called from safe code (e.g. in trim_start), Haystack needs to be an unsafe trait. Otherwise, without ever writing an unsafe block and therefore promising to uphold the invariants of safe code, an invalid implementation of Haystack could violate memory safety. The fact that split_around and split_unchecked are unsafe capture the fact that the caller, to preserve memory safety, must pass in valid indices, but it does nothing to prevent the callee from doing arbitrary bad behaviour even if the indices are valid.

Hay probably also needs to be an unsafe trait. It looks like in practice, Searchers are implemented for specific Hay types, indicating trust of just those implementations, and , so it may not be strictly necessary. Additionally, one of the requirements of a valid Haystack implementation could be the validity of the associated Hay type. However, with the proposed impl<'a, H: Hay> Haystack for &'a H, this is impossible to promise, and I think it would be necessary for Hay to be an unsafe trait.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, both are now unsafe.


A `SharedHaystack` is a marker sub-trait which tells the compiler this haystack can cheaply be
cheaply cloned (i.e. shared), e.g. a `&H` or `Rc<H>`. Implementing this trait alters some behavior
of the `Span` structure discussed next section.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat uncomfortable by the use of specialization to modify the behaviour of Span instead of just providing more optimized functions. Admittedly, this changed behaviour seems hard to directly observe, since into_parts is only available on shared spans. This definitely isn't a big deal.

with invalid ranges. Implementations of these methods often start with:

```rust
fn search(&mut self, span: SharedSpan<&A>) -> Option<Range<A::Index>> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is SharedSpan a relic of a previous version of this proposal? I don't see it defined anywhere and it sounds like Span<H> where H: SharedHaystack.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. (In an ancient version there was SharedSpan<H> and UniqueSpan<H> where Haystack has an associated type Span to determine which span to use. The resulting code was quite ugly.)

let span = unsafe { Span::from_parts("CDEFG", 3..8) };
// we can find "CD" at the start of the span.
assert_eq!("CD".into_searcher().search(span.clone()), Some(3..5));
assert_eq!("CD".into_searcher().consume(span.clone()), Some(5));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this (and the other examples calling .into_searcher().consume(...)) be .into_consumer().consume(...)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Copy-and-paste error.

let mut searcher = pattern.into_searcher();
let mut rest = Span::from(haystack);
while let Some(range) = searcher.search(rest.borrow()) {
let [left, _, right] = unsafe { rest.split_around(range) };
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems very common to call split_around and then throw away one or more of the components. For owned containers like Vec, at least, this involves allocating a vector for the ignored elements, copying them to their new location, then finally dropping and deallocating. Would it be possible to add more methods to Haystack that only return some of the parts? They could have default definitions in terms of split_around, so they shouldn't cause any more difficulty for implementers, but owned containers would be able to override them for better performance.

It also occurs to me that slice_unchecked is actually one of these specialized methods, returning only the middle component.

Copy link
Member Author

@kennytm kennytm Aug 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need 3 more names for these 😝 ([left, middle, _], [left, _, right], [_, middle, right])

pub trait Haystack: Deref<Target: Hay> + Sized {
fn empty() -> Self;
unsafe fn split_around(self, range: Range<Self::Target::Index>) -> [Self; 3];
unsafe fn slice_unchecked(self, range: Range<Self::Target::Index>) -> Self;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could slice_unchecked have a default implementation as follows?

unsafe fn slice_unchecked(self, range: Range<Self::Target::Index>) -> Self {
    let [_, middle, _] = self.split_around(range);
    middle
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Added.


* Implement `Hay` to `str`, `[T]` and `OsStr`.

* Implement `Haystack` to `∀H: Hay. &H`, `&mut str` and `&mut [T]`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern_3 crate also has an implementation for Vec<T> (though not String or OsString). Are those owned implementations intended eventually? Is that just out of scope for this particular RFC?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't intend to add these into the standard library, due to the efficiency concern you've raised.

pattern_3 does implement for Vec<T> just to illustrate that it can transfer owned data type correctly.

@gereeter
Copy link

gereeter commented Aug 2, 2018

I separated this comment out because it is far more questionable than the rest. I know that I personally tend to go overboard with squeezing out tiny and inconsequential bits of runtime performance at the expense of compile time and ergonomics. That said,

Merged Searcher and Consumer into a single trait, to reduce the number of types. The concept of "consumer" still exists where the Searcher impl can be an enum to choose between a search-optimized or consume-optimized structure. Microbenchmarks shows that this runtime selection doesn't incur much slowdown.

This doesn't feel like the right trade-off to me. Reducing the number of types is definitely useful for implementers of patterns that don't have a special optimization for consume and may make comprehension easier for users of the API. However,

  • Implementations are probably going to be much more rare than uses. Complicating the implementation a small amount for the sake of performance seems like a good thing. Yes, the runtime selection doesn't incur much slowdown, but it seems wrong to force an unnecessary performance penalty, no matter how small.
  • I actually find the split types easier to read (as a user). It seems confusing that there are two functions that return identical types, but might (or might not, depending on the searcher) panic if I call the wrong function on their result. With the split types, the type system tells me to call search on a Searcher and consume on a Consumer and will yell at me if I get it wrong. If I only test with haystacks that have the same searcher and consumer, I might not notice the mistake, meaning I could have a less efficient implementation that also breaks when given certain types. This is a little far-fetched, I admit, since [T] has a different searcher and consumer, but that should be an implementation detail. It shouldn't be easy to run into internal points like that.

If getting the implementation right is an issue, there could just be a wrapper type along the lines of

pub struct SearcherConsumer<S> {
    inner: S
}

impl<H, S: Searcher<H>> Consumer<H> for SearcherConsumer {
    // ...
}

This could then be the default type for Pattern::Consumer. If there isn't a particularly performant way to implement Consumer, then just using the default should be painless and correct.

@kennytm
Copy link
Member Author

kennytm commented Aug 3, 2018

@gereeter

Implementations are probably going to be much more rare than uses.

This I disagree. People seldom use Searcher/Consumer directly, unless they are implementing a new generic algorithm. The standard matches/split/starts_with/etc methods already covered all common cases you could do with a generic pattern.

Yes, the runtime selection doesn't incur much slowdown, but it seems wrong to force an unnecessary performance penalty, no matter how small.

There is zero performance penalty in string matching caused by merging consumer and searcher, because we already have a runtime selection between empty and non-empty needle.

In trim_start and starts_with, I believe LLVM is able to recognize there's a loop invariant (not tested).

@Kimundi
Copy link
Member

Kimundi commented Aug 8, 2018

@kennytm: Thank you, thank you, thank you! 🎉🎉🎉 This is the kind of end state that I attempted to reach with the various Pattern API sketches, but never could put enough effort into.

In general huge 👍 from me, but I had a few thoughts when reading through the RFC and comments right now.

  • Probably way past the point where it would be sensible, but your comparison of Pattern and Searcher to IntoIterator and Iterator made me think of whether it wouldn't have been best to call Pattern IntoSearcher from the start. Mainly because its name is horribly confusing with Rusts pattern matching support, especially given that pattern matching also supports "string patterns" and "slice patterns".
  • I also somewhat agree with @gereeter that the split of Searcher/Consumer while using the same Self::Searcher type seems somewhat confusing.

@kennytm
Copy link
Member Author

kennytm commented Aug 11, 2018

Okay, so Searcher and Consumer are separate again then 🙃.

@Kimundi If Searcher and Consumer are two different traits, renaming Pattern to IntoSearcher would be misleading as it omits the Consumer part. Also, we cannot have a blanket impl unlike IntoIterator:

impl<H, S> Pattern<H> for S
where
    H: Haystack,
    S: Searcher<H::Target> + Consumer<H::Target>,
{
    type Searcher = Self;
    type Consumer = Self;
    fn into_searcher(self) -> Self { self }
    fn into_consumer(self) -> Self { self }
}

because due to backward compatibility we have a different blanket impl to support:

impl<'h, F> Pattern<&'h str> for F
where
    F: FnMut(char) -> bool,
{ ... }

and these two will conflict when a type implements (FnMut(char) -> bool) + Searcher<str> + Consumer<str>. (I've added this in the latest commit.)

Given these details I'm mildly against renaming Pattern.

@shepmaster
Copy link
Member

shepmaster commented Aug 11, 2018

Experience report — consuming Pattern

My goal is to split a string into an iterator of delimiter / not delimiter values.

Background

Previous comment

I've implemented this code previously using the current Pattern API.

Thoughts

  • While my original case is focused on splitting strings with delimiters, I am interested in making the code as generic as possible. I would usually start by making the code using a concrete type (e.g. &str) and then making it more generic.

  • I don't understand what &str would be in the terms of the new API. Is it a Haystack? Hay? Should I actually be using str instead?

  • The relationships between Haystack, Hay, and Span are unclear.

  • Span is poorly introduced / motivated in the documentation.

  • I don't like the terms "Haystack" and "Hay" (or perhaps the things that they are applied to?). In my mind, I search a haystack, but the code actually searches inside of a Span.

  • I usually think of searching a haystack for a needle, but I haven't seen mention of a "needle". Perhaps that's a useful word to use, somewhere?

  • I seemingly cannot search a slice for a single value:

    ext::find(b"alpha", b'a'); // no
    ext::find(b"alpha", b"a"); // no
    ext::find(b"alpha", &b"a"[..]); // no
    ext::find(&b"alpha"[..], b'a'); // no
    ext::find(&b"alpha"[..], b"a"); // no
    ext::find(&b"alpha"[..], b"a"[..]); // no

    I think if this is merged, people will expect a lot of feature parity between slices and strings.

Working code

extern crate pattern_3;

use pattern_3::{Hay, Haystack, Pattern, Searcher, Span};
use std::ops::Deref;

#[derive(Copy, Clone, Debug, PartialEq)]
pub enum SplitType<T> {
    Piece(T),
    Delimiter(T),
}

pub struct SplitKeepingDelimiter<Thing, S>
where
    Thing: Haystack + Deref,
    Thing::Target: Hay,
{
    thing: Span<Thing>,
    searcher: S,
    saved_delimiter: Option<Thing>,
}

impl<Thing, S> Iterator for SplitKeepingDelimiter<Thing, S>
where
    Thing: Haystack + Deref,
    Thing::Target: Hay,
    S: Searcher<Thing::Target>,
{
    type Item = SplitType<Thing>;

    fn next(&mut self) -> Option<Self::Item> {
        if let Some(saved_delimiter) = self.saved_delimiter.take() {
            return Some(SplitType::Delimiter(saved_delimiter));
        }

        let thing = self.thing.take();
        // Search for the next occurrence of the delimiter
        match self.searcher.search(thing.borrow()) {
            Some(idx) => {
                // We found a delimiter
                let [l, m, r] = unsafe { thing.split_around(idx) };

                if l.is_empty() {
                    // The delimiter starts the remainder of the string
                    self.thing = r;
                    Some(SplitType::Delimiter(m.into()))
                } else {
                    // There's something before the delimiter
                    self.saved_delimiter = Some(m.into());
                    self.thing = r;
                    Some(SplitType::Piece(l.into()))
                }
            }
            None => {
                // There are no more delimiters
                if thing.is_empty() {
                    // And there's no more string to search
                    None
                } else {
                    // One last piece to return
                    Some(SplitType::Piece(thing.into()))
                }
            }
        }
    }
}

pub trait SplitKeepingDelimiterExt {
    fn split_keeping_delimiter<P>(self, pattern: P) -> SplitKeepingDelimiter<Self, P::Searcher>
    where
        Self: Haystack,
        Self::Target: Hay,
        P: Pattern<Self>;
}

impl<H> SplitKeepingDelimiterExt for H
where
    H: Haystack,
    H::Target: Hay,
{
    fn split_keeping_delimiter<P>(self, pattern: P) -> SplitKeepingDelimiter<Self, P::Searcher>
    where
        P: Pattern<Self>,
    {
        SplitKeepingDelimiter {
            thing: Span::from(self),
            searcher: pattern.into_searcher(),
            saved_delimiter: None,
        }
    }
}

#[cfg(test)]
mod test {
    use super::SplitKeepingDelimiterExt;

    #[test]
    fn split_with_delimiter() {
        use super::SplitType::*;
        let delims = &[',', ';'][..];
        let items: Vec<_> = "alpha,beta;gamma".split_keeping_delimiter(delims).collect();
        assert_eq!(
            &items,
            &[
                Piece("alpha"),
                Delimiter(","),
                Piece("beta"),
                Delimiter(";"),
                Piece("gamma")
            ]
        );
    }

    #[test]
    fn split_with_delimiter_allows_consecutive_delimiters() {
        use super::SplitType::*;
        let delims = &[',', ';'][..];
        let items: Vec<_> = ",;".split_keeping_delimiter(delims).collect();
        assert_eq!(&items, &[Delimiter(","), Delimiter(";")]);
    }

    #[test]
    fn split_with_delimiter_bytes() {
        use super::SplitType::*;

        let items: Vec<_> = b"comma,separated,data,".split_keeping_delimiter(|&c: &u8| c == b',').collect();
        assert_eq!(
            &items,
            &[
                Piece(&b"comma"[..]),
                Delimiter(b","),
                Piece(b"separated"),
                Delimiter(b","),
                Piece(b"data"),
                Delimiter(b","),
            ]
        );
    }
}

I'm not happy with the Thing generic type name; I really wanted to call it "haystack", but it's not a Haystack so...

Overall thoughts

I'm very optimistic about this API. I'm hoping that a diverse set of eyes on the code can help hammer out the naming as well as adding more comprehensive documentation.

@kennytm
Copy link
Member Author

kennytm commented Aug 12, 2018

Thanks for the report @shepmaster !


I usually think of searching a haystack for a needle, but I haven't seen mention of a "needle". Perhaps that's a useful word to use, somewhere?

We could rename the trait Pattern to Needle. wdyt?

cc @Centril (1) and @Kimundi (2) who want to rename Pattern to something else

The name Haystack is fine because we get

fn contains<H, P>(haystack: H, needle: P) -> bool
where
    H: Haystack,
    P: Needle<H>;

so for those not directly working with Searcher we are indeed "searching for a needle in a haystack".

In my mind, I search a haystack, but the code actually searches inside of a Span.

maybe reading it as "search inside of a ____ of haystack" is better?


I seemingly cannot search a slice for a single value:

This is unfortunately impossible because it will conflict with:

impl<'h, T, F> Pattern<&'h [T]> for F
where 
    F: FnMut(&T) -> bool,

as we could impl FnMut(&Foo) -> bool for &Foo for a third-party type Foo (this is possible because & is #[fundamental]).

@Centril
Copy link
Contributor

Centril commented Aug 12, 2018

We could rename the trait Pattern to Needle. wdyt?

My thinking is that anything is better than Pattern. ;)

Other than that I don't have any strong opinions (or any opinions at all).
I think Needle is fine. Have you considered any naming based on the word "predicate"?

@kennytm
Copy link
Member Author

kennytm commented Aug 12, 2018

@Centril "Predicate" feels too generic and I'd expect Predicate<T> is exactly FnMut(&T) -> bool like C#.

@Centril
Copy link
Contributor

Centril commented Aug 12, 2018

@kennytm I buy that :) Needle will have to do barring a better name.

@Kimundi
Copy link
Member

Kimundi commented Aug 13, 2018

👍 for Needle - though then the question is wether we now refer to it as the "haystack API" or the "needle API" 😄

Re: Ability to search a single element: We could provide a newtype-like wrapper type:

ext::find(&b"alpha"[..], Needle(b'a'));

Would be kind of ugly, but at least be doable.

Alternatively, we put the FnMut pattern behind a newtype wrapper, and live with the inconsistency to strings (though I'm not sure if this fixes the issue).

@kennytm
Copy link
Member Author

kennytm commented Aug 14, 2018

"haystack API" or the "needle API"

This reminds me that the module name core::pattern may also need to be changed 😄

Alternatively, we put the FnMut pattern behind a newtype wrapper, and live with the inconsistency to strings (though I'm not sure if this fixes the issue).

This unfortunately will conflict with the stabilized APIs like <[T]>::split, for this to work we'll need to instead introduce .split_matches etc. (OTOH we can keep .contains and remove .contains_match.)

@kennytm
Copy link
Member Author

kennytm commented Nov 14, 2018

@SimonSapin Thanks!

disjointness

Without #1672 some third-party types are not covered by blanket Needle impls due to conflicts. If we stabilize without #1672, those third-party types could impl Needle themselves, meaning we the standard library cannot add the blanket impls later.

This is fine if we only focus on built-in needle types like &str and ignore third-party types like maybe QtStr. IMO #1672 is needed before stabilization because I don't like having obvious holes with a known solution 😉

double-ended vs reverse

Updated the RFC. These are explained in details in the library docs:

previously docs.rs didn't show them due to outdated compiler, but it has just been fixed and are now visible 🎉

I feel that this RFC already "spends" a very high amount of complexity budget and API surface in order to be very general and support many scenarios. Maybe this is an area where we can simplify it, and not sacrifice much in practice? (Then again maybe this simplification wouldn’t help a lot either.)

The simplification of ignoring LinkedList<T> would be removing the Hay::Index associated type, forcing it to be usize, making interfaces like https://docs.rs/pattern-3/0.5.0/pattern_3/haystack/trait.Haystack.html#tymethod.split_around probably easier to read. But there's no impact on the amount of traits or methods otherwise (except trivial ones like Hay::start_index).

@eddyb
Copy link
Member

eddyb commented Nov 19, 2018

Has the indexing crate been considered while designing the API? As in, can its indices be used?
(Although I'm not sure ops::Range<indexing::Index> is equivalent to indexing::Range)

It might be interesting to experiment with designing the entire API on generative typing, but without ATC it can be trickier, and there might not be a solution to containers where not every index is valid (e.g. str's UTF-8 requirements) - at the very least, everything would be more painful to interact with.

@kennytm
Copy link
Member Author

kennytm commented Nov 19, 2018

@eddyb I tried it before (as something like Span<'h, H>), but found that the extra type information makes the API extremely noisy and hard to understand, so I just abandoned the idea half-way :).

I also don't think with today's Rust one could safely use indexing-style branded index with iterators e.g. <&mut [T]>::match_indices(), since you'll need to temporarily escape the scope() (and you can't Pin the returned iterator either).

@eddyb
Copy link
Member

eddyb commented Nov 19, 2018

@kennytm I've prototyped an existential wrapper solution for the indexing generative lifetime, which allows owning both the indexing::Container and anything containing Indexes/Ranges with the same generative lifetime, and effectively "reentering" the scope at a later time.

AFAIK it's sound when used with indexing, but yesterday I realized I've accidentally made it general enough that it can erase e.g. the 'a in &'a T, which clearly isn't sound.

I'll have to look into making the construction limited to calling indexing::scope for you, such that you can only erase the generative lifetime that scope introduces, into the existential wrapper.

@SimonSapin
Copy link
Contributor

@rfcbot resolve double-ended vs reverse

@SimonSapin
Copy link
Contributor

@rfcbot resolve blocked on disjointness

@kennytm, can I ask you to make sure when the tracking issue for this RFC is created that it notes in the description that stabilizing the Needle trait is blocked on #1672? Is there a subset of the proposed API that can be stabilized in the meantime?

@SimonSapin
Copy link
Contributor

@rfcbot resolve yagni

I still feel that this is an unprecedented amount of complexity for a standard library feature. However, stabilizing some set of traits to "explain" the existing behavior of str::matches and other stable methods, as well extending to other types like OsStr are both definitely desirable, and I don’t know if it’s possible at all do much simpler given all the constraints.

@rfcbot rfcbot added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Nov 19, 2018
@rfcbot
Copy link
Collaborator

rfcbot commented Nov 19, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Nov 19, 2018
@Ixrec
Copy link
Contributor

Ixrec commented Nov 19, 2018

I still feel that this is an unprecedented amount of complexity for a standard library feature.

Minor nitpick, but I think a case could be made that tasks and futures are similarly complex.

@kennytm
Copy link
Member Author

kennytm commented Nov 19, 2018

@SimonSapin

Is there a subset of the proposed API that can be stabilized in the meantime?

Since impl Needle is the only concern, we could keep the Needle trait unstable and stabilize everything else... but one big reason of a stable Needle API is third-party Needles (e.g. regex), so this defeats the main point of stabilizing this RFC 😂.

@Centril Centril added A-needle Needle API related proposals & ideas A-traits-libstd Standard library trait related proposals & ideas A-types-libstd Proposals & ideas introducing new types to the standard library. labels Nov 22, 2018
@rfcbot rfcbot added the finished-final-comment-period The final comment period is finished for this RFC. label Nov 29, 2018
@rfcbot
Copy link
Collaborator

rfcbot commented Nov 29, 2018

The final comment period, with a disposition to merge, as per the review above, is now complete.

@rfcbot rfcbot removed the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Nov 29, 2018
@Centril Centril merged commit ef572c3 into rust-lang:master Nov 29, 2018
@Centril
Copy link
Contributor

Centril commented Nov 29, 2018

🎉 Huzzah! This RFC has been merged! 🎉

Tracking issue: rust-lang/rust#56345

@yaahc
Copy link
Member

yaahc commented Jul 14, 2021

We discussed this RFC and specifically the implementation PR in the last few libs team meetings and have decided to revert the decision to stabilize this interface as indicated in rust-lang/rust#76901 (comment).

@kennytm kennytm deleted the pattern-3 branch July 15, 2021 07:40
@crlf0710
Copy link
Member

Withdrawing PR ready at #3154

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-needle Needle API related proposals & ideas A-traits-libstd Standard library trait related proposals & ideas A-types-libstd Proposals & ideas introducing new types to the standard library. disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.