Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for remainder methods for str split iterators #77998

Open
2 of 3 tasks
WaffleLapkin opened this issue Oct 15, 2020 · 17 comments
Open
2 of 3 tasks

Tracking Issue for remainder methods for str split iterators #77998

WaffleLapkin opened this issue Oct 15, 2020 · 17 comments
Labels
A-iterators Area: Iterators A-str Area: str and String B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@WaffleLapkin
Copy link
Member

WaffleLapkin commented Oct 15, 2020

This is a tracking issue for the methods implemented in #75265 and #82570, which allow viewing the remainder of the underlying string in str-split iterators. E.g.:

let mut split = "Mary had a little lamb".split(' '); 
assert_eq!(split.remainder(), Some("Mary had a little lamb"));

_ = split.next();
assert_eq!(split.remainder(), Some("had a little lamb"));

split.by_ref().for_each(drop);
assert_eq!(split.remainder(), None);

The feature gates for the issue are #![feature(str_split_remainder)] (for most split iterators), #![feature(str_split_inclusive_remainder)] (for SplitInclusive) and #![feature(str_split_whitespace_remainder)] (for SplitWhitespace and SplitAsciiWhitespace).

Public API

// mod core::str
impl<'a, P: Pattern<'a>> Split<'a, P> {
    fn remainder(&self) -> Option<&'a str>;
}

// And the same for
// - `RSplit<'a, P>`
// - `SplitTerminator<'a, P>`
// - `RSplitTerminator<'a, P>`
// - `SplitN<'a, P>`
// - `RSplitN<'a, P>`
// - `SplitWhitespace<'a>`
// - `SplitAsciiWhitespace<'a>`
// - `SplitInclusive<'a, P>`

Steps

Unresolved Questions

Implementation history

@WaffleLapkin WaffleLapkin added the C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC label Oct 15, 2020
@jonas-schievink jonas-schievink added A-iterators Area: Iterators B-unstable Blocker: Implemented in the nightly compiler and unstable. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Oct 15, 2020
@SkiFire13
Copy link
Contributor

Looking at its name it isn't immediately clear what it returns. Wouldn't remainder or something along that line be a more self-explanatory name?

@WaffleLapkin
Copy link
Member Author

The name as_str was chosen to be in line with Chars::as_str

@KodrAus KodrAus added the Libs-Tracked Libs issues that are tracked on the team's project board. label Nov 6, 2020
@KodrAus KodrAus added the A-str Area: str and String label Jan 6, 2021
@TyPR124
Copy link
Contributor

TyPR124 commented Feb 20, 2021

Can this be added to SplitWhitespace and SplitAsciiWhitespace?

@WaffleLapkin
Copy link
Member Author

@TyPR124 as noted in the implementation PR discussion it is possible, but not trivial.

Both Split*Whitespace are using iterators internally which doesn't allow to access it's inner parts (^1, ^2). It's possible to 'solve' this problem either by making the fields pub(crate::str::iter) (tbh it's a hack) or providing methods on iterator adaptors that allow accessing inner iterators (downside: there are lots of adaptors to change).

I didn't want to complicate the original PR with this, so didn't include anything like this. I can now make a new PR that'll add the same methods to Split*Whitespace, thanks for reminding about it!

@WaffleLapkin
Copy link
Member Author

Update: #82570 was merged, so now SplitWhitespace and SplitAsciiWhitespace has as_str method too!

@soerenmeier
Copy link
Contributor

I just had a case we're i needed to split a str: "tag:value" into a tag: &str and a value: Option<&str>.
I know i can just use Some(iter.as_str()).filter(str::is_empty) to get the Option, but maybe there could be another method named as_opt_str or remainder. That method would return None if calling next would also return None.
What do you think?

@SkiFire13
Copy link
Contributor

Note that split iterators could yield an empty str as their final item, so as_opt_str could return Some("") while your code with always filter that case.

@soerenmeier
Copy link
Contributor

That's true, to be correct the above code should look like this:

let s = iter.as_str();
iter.next().map(|_| s)

Would be nice to have as_opt_str.

@m-ou-se
Copy link
Member

m-ou-se commented May 4, 2021

I just had a case we're i needed to split a str: "tag:value" into a tag: &str and a value: Option<&str>.

That sounds like a use case for the recently stabilized split_once.

@m-ou-se
Copy link
Member

m-ou-se commented May 4, 2021

I do think that returning an Option<&str> here might make more sense though. Right now there's no way to know if the Split is finished, or has just consumed a separator and will produce another Some (possibly a Some("")). Calling .next() for that is not great, because that will try to match against the splitter pattern again.

Maybe .as_str() -> Option<&s> or .remaining() -> Option<&s> or something would be better.

@WaffleLapkin
Copy link
Member Author

Maybe .as_str() -> Option<&s> or .remaining() -> Option<&s> or something would be better.

I could add this, seems reasonable. The only question is: do we want both -> &str and -> Option<&str>, or do we only want the latter one?

@m-ou-se
Copy link
Member

m-ou-se commented May 5, 2021

I personally think we should only have the Option version. I'm worried the -> &str version leads to subtle bugs.

For example, I recently saw someone write this as an alternative for split_once('='):

let mut split = "hello=world".split('=');
let (key, val) = (split.next()?, split.as_str());

However, the behaviour is subtly different: "hello=" and "hello" both produce same ("hello", "") output, whereas split_once('=') only accepts "hello=" and returns None if the = is missing. With Option<&str> we force the user to handle that case.

@tbu-
Copy link
Contributor

tbu- commented Mar 30, 2022

I find the as_str name unintuitive, choosing it for the sole reason of symmetry with Chars::as_str sounds bad to me.

bors added a commit to rust-lang-ci/rust that referenced this issue Jan 3, 2023
…r_take2, r=Amanieu

`Split*::as_str` refactor

I've made this patch almost a year ago, so the rename and the behavior change are in one commit, sorry 😅

This fixes rust-lang#84974, as it's required to make other changes work.

This PR
- Renames `as_str` method of string `Split*` iterators to `remainder` (it seems like the `as_str` name was confusing to users)
- Makes `remainder` return `Option<&str>`, to distinguish between "the iterator is exhausted" and "the tail is empty", this was [required on the tracking issue](rust-lang#77998 (comment))

r? `@m-ou-se`
RalfJung pushed a commit to RalfJung/miri that referenced this issue Jan 9, 2023
…r=Amanieu

`Split*::as_str` refactor

I've made this patch almost a year ago, so the rename and the behavior change are in one commit, sorry 😅

This fixes #84974, as it's required to make other changes work.

This PR
- Renames `as_str` method of string `Split*` iterators to `remainder` (it seems like the `as_str` name was confusing to users)
- Makes `remainder` return `Option<&str>`, to distinguish between "the iterator is exhausted" and "the tail is empty", this was [required on the tracking issue](rust-lang/rust#77998 (comment))

r? `@m-ou-se`
@WaffleLapkin WaffleLapkin changed the title Tracking Issue for Add as_str method to str split iterators Tracking Issue for remainder methods for str split iterators Jan 31, 2023
thomcc pushed a commit to tcdi/postgrestd that referenced this issue May 31, 2023
…r=Amanieu

`Split*::as_str` refactor

I've made this patch almost a year ago, so the rename and the behavior change are in one commit, sorry 😅

This fixes #84974, as it's required to make other changes work.

This PR
- Renames `as_str` method of string `Split*` iterators to `remainder` (it seems like the `as_str` name was confusing to users)
- Makes `remainder` return `Option<&str>`, to distinguish between "the iterator is exhausted" and "the tail is empty", this was [required on the tracking issue](rust-lang/rust#77998 (comment))

r? `@m-ou-se`
@Piturnah
Copy link
Contributor

What's the status wrt stabilisation?

@smarnach
Copy link
Contributor

smarnach commented Aug 5, 2023

I noticed that SplitWhitespace::remainder() has some counter-intuitive behaviour if tokens are separated by multiple whitespace characters. The remainder() method includes all but the first whitespace character in the returned string:

let s = "foo \tbar";
let mut it = s.split_whitespace();
it.next().unwrap();
assert_eq!(it.remainder(), Some("\tbar"));

I would expect it to either include all of the whitespace or none of it (though I do understand why it behaves the way it does). This behaviour, if intended, should at least be documented.

@kornelski
Copy link
Contributor

I needed a functionality of "split the string once on X or Y, whichever is first", and I need to know whether X or Y was first.

I hoped I could use str.split(|c: char| c == X || c == Y) and then inspect the iter.remainder() to see whether X or Y was there, but I was disappointed to learn that the remainder is after the next split, rather than immediately after the previously returned substring.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 26, 2024
…ines, r=dtolnay

Add `str::Lines::remainder`

Based on rust-lang#98453.

This PR adds `str::Lines::remainder` similarly to [other remainder function on str split iterators](rust-lang#77998).
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 26, 2024
…ines, r=dtolnay

Add `str::Lines::remainder`

Based on rust-lang#98453.

This PR adds `str::Lines::remainder` similarly to [other remainder function on str split iterators](rust-lang#77998).
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Jan 26, 2024
Rollup merge of rust-lang#107464 - WaffleLapkin:all_that_remains_of_lines, r=dtolnay

Add `str::Lines::remainder`

Based on rust-lang#98453.

This PR adds `str::Lines::remainder` similarly to [other remainder function on str split iterators](rust-lang#77998).
@lubomirkurcak
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-iterators Area: Iterators A-str Area: str and String B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests