-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking Issue for ASCII trim functions on byte slices #94035
Comments
If we're adding It seems reasonable enough to want to trim ASCII whitespace from a naming bikeshed: might |
Since
Yes, you have a point there! There's also |
…iplett core: Implement ASCII trim functions on byte slices Hi `@rust-lang/libs!` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035
…iplett core: Implement ASCII trim functions on byte slices Hi ``@rust-lang/libs!`` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035
…iplett core: Implement ASCII trim functions on byte slices Hi ```@rust-lang/libs!``` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035
…iplett core: Implement ASCII trim functions on byte slices Hi ````@rust-lang/libs!```` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035
…iplett core: Implement ASCII trim functions on byte slices Hi `````@rust-lang/libs!````` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035
…iplett core: Implement ASCII trim functions on byte slices Hi ``````@rust-lang/libs!`````` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035
…iplett core: Implement ASCII trim functions on byte slices Hi ```````@rust-lang/libs!``````` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035
…iplett core: Implement ASCII trim functions on byte slices Hi ````````@rust-lang/libs!```````` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035
Naively (without knowing the internals at all), I could think of "performance" as a reason. That being said, does this unresolved question really need to block this from being stabilised? After all, it's specifically about |
Both byte slices and string slices can have both Unicode-aware trimming and ASCII-aware trimming. Unicode support needn't be limited to With that said, I think adding these methods starts us down the path of treating byte slices as byte strings. I think I am in general in favor of that, but I do think we want to proceed deliberately here. It might make sense to block this on a higher level policy question of 1) do we want byte slices to be byte strings? and 2) if so, what do we want to add? (2) is particularly relevant to make sure we end up with good names I think. |
I have a use case where I have to trim a bunch of |
I think these would be useful (including also on |
What is the status of this feature? |
I don't think that this a policy decision like this is needed. Byte slices may contain byte strings, or something else. Of course trimming ASCII whitespace results in nonsensical results, if the slices aren't actually ASCII strings, but that's all. Note that there are already other slice methods that treat bytes as potential strings, for example: What would be needed to move this proposal into FCP? |
I filed a libs ACP, which was accepted, for extending these functions to Implementation PR #118523
I was instructed to use this tracking issue. |
Add ASCII whitespace trimming functions to `&str` - Add `trim_ascii_start`, `trim_ascii_end`, and `trim_ascii` functions to `&str` for trimming ASCII whitespace - Add `#[inline]` to `[u8]` `trim_ascii` functions These functions are feature-gated by `#![feature(byte_slice_trim_ascii)]` rust-lang#94035
Rollup merge of rust-lang#118523 - okaneco:trim_ascii, r=Mark-Simulacrum Add ASCII whitespace trimming functions to `&str` - Add `trim_ascii_start`, `trim_ascii_end`, and `trim_ascii` functions to `&str` for trimming ASCII whitespace - Add `#[inline]` to `[u8]` `trim_ascii` functions These functions are feature-gated by `#![feature(byte_slice_trim_ascii)]` rust-lang#94035
Stabilization ReportImplementation History
API SummaryThe following six functions would be stabilized: three functions on
Examples: // &[u8]
assert_eq!(b"\r hello world\n ".trim_ascii(), b"hello world");
assert_eq!(b"\r hello world\n ".trim_ascii_end(), b"\r hello world");
assert_eq!(b" \t hello world\n".trim_ascii_start(), b"hello world\n");
// &str
assert_eq!("\r hello world\n ".trim_ascii(), "hello world");
assert_eq!("\r hello world\n ".trim_ascii_end(), "\r hello world");
assert_eq!(" \t hello world\n".trim_ascii_start(), "hello world\n"); Possibly Unresolved QuestionsQuestion: Does a policy decision need to be made about treating byte slices as byte strings? What methods should be added?
Response:
For more context, If this is adequate to start FCP, I can make the stabilization PR. |
Seems like this is more than ready for stabilization. |
Nominating for T-libs-api, as there seem to be no blockers to stabilization, with a report already being written. |
This overall seems reasonable to me. We've clearly already started down the path of " @rfcbot fcp merge |
Team member @BurntSushi has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. This will be merged soon. |
Rollup merge of rust-lang#124928 - okaneco:trim_ascii, r=workingjubilee Stabilize `byte_slice_trim_ascii` for `&[u8]`/`&str` Remove feature from documentation examples Update intra-doc link for `u8::is_ascii_whitespace` on `&[u8]` functions Closes rust-lang#94035 FCP has successfully completed rust-lang#94035 (comment)
🎉 Is this worth an explicit relnotes tag? |
@jonhoo Definitely; I've added it. |
Stabilize `byte_slice_trim_ascii` for `&[u8]`/`&str` Remove feature from documentation examples Update intra-doc link for `u8::is_ascii_whitespace` on `&[u8]` functions Closes #94035 FCP has successfully completed rust-lang/rust#94035 (comment)
Feature gate:
#![feature(byte_slice_trim_ascii)]
This is a tracking issue for ASCII trim functions on byte slices.
Public API
The feature adds three new methods to byte slices (
[u8]
):const fn trim_ascii_start(&self) -> &[u8]
: Remove leading ASCII whitespaceconst fn trim_ascii_end(&self) -> &[u8]
: Remove trailing ASCII whitespaceconst fn trim_ascii(&self) -> &[u8]
: Remove leading and trailing ASCII whitespaceFor deciding what bytes to treat as whitespace,
u8::is_ascii_whitespace
is used. See the linked docs for more details.Examples:
Steps / History
Unresolved Questions
str
as well?The text was updated successfully, but these errors were encountered: