Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add str::{Split,RSplit,SplitN,RSplitN,SplitTerminator,RSplitTerminator,SplitInclusive}::as_str methods #75265

Merged
merged 4 commits into from
Oct 16, 2020

Conversation

WaffleLapkin
Copy link
Member

tl;dr this allows viewing unyelded part of str-split-iterators, like so:

let mut split = "Mary had a little lamb".split(' '); 
assert_eq!(split.as_str(), "Mary had a little lamb");
split.next();
assert_eq!(split.as_str(), "had a little lamb");
split.by_ref().for_each(drop);
assert_eq!(split.as_str(), "");

This PR adds semi-identical as_str methods to most str-split-iterators with signatures like &'_ Split<'a, P: Pattern<'a>> -> &'a str (Note: output &str lifetime is bound to the 'a, not the '_). The methods are similar to Chars::as_str

SplitInclusive::as_str is under "str_split_inclusive_as_str" feature gate, all other methods are under "str_split_as_str" feature gate.

Before this PR you had to sum lens of all yielded parts or collect into String to emulate as_str.

@rust-highfive
Copy link
Collaborator

r? @kennytm

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 7, 2020
@LeSeulArtichaut
Copy link
Contributor

Ping from triage: @kennytm do you want to review this? I'd expect someone from T-libs to be assigned here -- maybe cc @sfackler?

fn as_str(&self) -> &'a str {
// `Self::get_end` doesn't change `self.start`
if self.finished {
return "";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause surprising result for those who want to calculate ptr offsets of this &str from the whole &str

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a use case when you need to do those kinds of tricks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the fix is trivial, but for me, it's not clear if we should provide such guarantees.

Copy link

@loriopatrick loriopatrick Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like the return to be a slice of the provided input.

When parsing a http header, I find myself wanting to see how much is remaining in the buffer to determine if I have all the contents (specified by Content-Length).

Example

let http_response_buffer: &str;
let mut header_lines = http_response_buffer.split("\r\n");

let mut content_length = None;
/* read through header and set content_length, header ends on empty header_line */
/* ... */

let unread_buffer_size = unsafe {
   header_lines.as_str().as_ptr().offset_from(response_buffer.as_ptr())
};

if content_length.unwrap() < unread_buffer_size {
   panic!("http_response_buffer does not contain response body");
}

This would break if http_response_buffer was

Content-Length: 100

granted I am using unsafe but this pattern doesn't seem too unreasonable (I might be biased as it's what I want to do).

Edit

Just realized I could just do header_lines.as_str().len(). Late night for me, my justification isn't really holding water anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this today and thought that there should be a function (probably in a crate) like

fn substr_bounds(string: &str, sub: &str) -> Option<Range<usize>> { ... }

let string = "aabb";
let sub = &r[1..3];
assert_eq!(substr_bound(string, sub), Some(1..3));
assert_eq!(substr_bound(string, "ab"), None);

It feels natural, but are there any cases when it's useful? Anyway, as said before, for these kinds of things returning an empty substr may be a good idea.

If there is code that may benefit from this, I could make a PR with a fix.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be related to the semantics of raw pointer comparison.
rust-lang/unsafe-code-guidelines#239

@crlf0710 crlf0710 added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 12, 2020
@Dylan-DPC-zz
Copy link

r? @dtolnay

@rust-highfive rust-highfive assigned dtolnay and unassigned kennytm Sep 30, 2020
@bors
Copy link
Contributor

bors commented Oct 1, 2020

☔ The latest upstream changes (presumably #76325) made this pull request unmergeable. Please resolve the merge conflicts.

Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:

@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author

This commit introduses 2 methods under "str_split_as_str" gate with common
signature of `&Split<'a, _> -> &'a str'`. Both of them work like
`Chars::as_str` - return unyield part of the inner string.
…r` methods

This commit entroduce 4 methods smililar to `Split::as_str` all under the same
gate "str_split_as_str".
This commit entroduces `core::str::SplitInclusive::as_str` method similar to
`core::str::Split::as_str`, but under different gate -
"str_split_inclusive_as_str" (this is done so because `SplitInclusive` is
itself unstable).
@WaffleLapkin
Copy link
Member Author

Hm. I've fixed the conflict, but is it normal that bors didn't change the labels (S-waiting-on-review->S-waiting-on-author)? It said to run a command after resolving conflicts, but labels are already as they should be after resolving conflicts.

@Dylan-DPC-zz
Copy link

@WaffleLapkin not sure why, the change has been reported last week as well, will check it accordingly :)

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good!

Please make a tracking issue and update the stability attributes in the PR.

@WaffleLapkin
Copy link
Member Author

@dtolnay done :)

@leonardo-m
Copy link

What's an use case of this as_str method?

@Dylan-DPC-zz
Copy link

@bors r=dtolnay

@bors
Copy link
Contributor

bors commented Oct 15, 2020

📌 Commit 7bd6403 has been approved by dtolnay

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 15, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Oct 15, 2020
…olnay

Add `str::{Split,RSplit,SplitN,RSplitN,SplitTerminator,RSplitTerminator,SplitInclusive}::as_str` methods

tl;dr this allows viewing unyelded part of str-split-iterators, like so:
```rust
let mut split = "Mary had a little lamb".split(' ');
assert_eq!(split.as_str(), "Mary had a little lamb");
split.next();
assert_eq!(split.as_str(), "had a little lamb");
split.by_ref().for_each(drop);
assert_eq!(split.as_str(), "");
```

--------------

This PR adds semi-identical `as_str` methods to most str-split-iterators with signatures like `&'_ Split<'a, P: Pattern<'a>> -> &'a str` (Note: output `&str` lifetime is bound to the `'a`, not the `'_`). The methods are similar to [`Chars::as_str`]

`SplitInclusive::as_str` is under `"str_split_inclusive_as_str"` feature gate, all other methods are under `"str_split_as_str"` feature gate.

Before this PR you had to sum `len`s of all yielded parts or collect into `String` to emulate `as_str`.

[`Chars::as_str`]: https://doc.rust-lang.org/core/str/struct.Chars.html#method.as_str
@jyn514 jyn514 added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Oct 16, 2020
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 16, 2020
Rollup of 14 pull requests

Successful merges:

 - rust-lang#75023 (ensure arguments are included in count mismatch span)
 - rust-lang#75265 (Add `str::{Split,RSplit,SplitN,RSplitN,SplitTerminator,RSplitTerminator,SplitInclusive}::as_str` methods)
 - rust-lang#75675 (mangling: mangle impl params w/ v0 scheme)
 - rust-lang#76084 (Refactor io/buffered.rs into submodules)
 - rust-lang#76119 (Stabilize move_ref_pattern)
 - rust-lang#77493 (ICEs should always print the top of the query stack)
 - rust-lang#77619 (Use futex-based thread-parker for Wasm32.)
 - rust-lang#77646 (For backtrace, use StaticMutex instead of a raw sys Mutex.)
 - rust-lang#77648 (Static mutex is static)
 - rust-lang#77657 (Cleanup cloudabi mutexes and condvars)
 - rust-lang#77672 (Simplify doc-cfg rendering based on the current context)
 - rust-lang#77780 (rustc_parse: fix spans on cast and range exprs with attrs)
 - rust-lang#77935 (BTreeMap: make PartialCmp/PartialEq explicit and tested)
 - rust-lang#77980 (Fix intra doc link for needs_drop)

Failed merges:

r? `@ghost`
@bors bors merged commit 977df43 into rust-lang:master Oct 16, 2020
@rustbot rustbot added this to the 1.49.0 milestone Oct 16, 2020
@WaffleLapkin WaffleLapkin deleted the str_split_as_str branch October 16, 2020 08:45
@mikevoronov
Copy link

@leonardo-m I needed this feature while implementing very simple argument parser.

Let's imagine that I want to parse the following command:
call <module_name> <func_name> [args in json]

So in call after <func_name> extracting, I want to get a remainder of string as &str to parse it like a json with serde. In particular, here I had to use join or collect that implies some overhead.

@crumblingstatue
Copy link
Contributor

Would it be possible to also implement this for SplitWhitespace?

@WaffleLapkin
Copy link
Member Author

@crumblingstatue this should be possible, but it's not trivial. SplitWhitespace uses Filter iterator adaptor, making it impossible to reach inner Split and get the remainder. The only solution I can think of - making Filter's iter field pub(crate).

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Mar 19, 2021
…r, r=m-ou-se

Add `as_str` method for split whitespace str iterators

This PR adds `as_str` methods to `SplitWhitespace` and `SplitAsciiWhitespace`
str iterators. The methods return the remainder, similar to `as_str` methods on
`Chars` and other split iterators. This PR is a continuation of rust-lang#75265, which added `as_str` for all other str split iterators.

The feature gate for new methods is `#![feature(str_split_whitespace_as_str)]`.

`SplitWhitespace` and `SplitAsciiWhitespace` use iterators under the hood, so to implement `as_str` it's required to either
1. Make fields of some iterators `pub(crate)`
2. Add getter methods (like `into_inner`, `inner`, `inner_mut`...) to some (all) iterators
3. Completely rewrite `SplitWhitespace` and `SplitAsciiWhitespace`

This PR uses the 1. approach since it's easier to implement and requires fewer changes (and no changes to the public API). If you think that's not the right way, please, tell me.

r? `@m-ou-se`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.