Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impl String::into_chars #133057

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

tisonkun
Copy link
Contributor

@tisonkun tisonkun commented Nov 15, 2024

Tracking issue - #133125

r? @programmerjake @kennytm @Amanieu

This refers to rust-lang/libs-team#268

Before adding tests and creating a tracking issue, I'd like to reach a consensus on the implementation direction and two questions:

  1. Whether we'd add a String::into_char_indices method also?
  2. See inline comment.

@rustbot
Copy link
Collaborator

rustbot commented Nov 15, 2024

Failed to set assignee to programmerjake: invalid assignee

Note: Only org members with at least the repository "read" role, users with write permissions, or people who have commented on the PR may be assigned.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 15, 2024
None => None,
Some((_, ch)) => {
let offset = iter.offset();
drop(self.bytes.drain(..offset));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me of this thread that suggests a truncate_front method on Vec. Not sure if it's desired as a separate proposal.

library/alloc/src/string.rs Outdated Show resolved Hide resolved
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

library/alloc/src/string.rs Outdated Show resolved Hide resolved
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@tisonkun
Copy link
Contributor Author

Thanks for your review! It seems this direction is good to go. I'd create a tracking issue, update the attr in code and add docs with doctests, then request a review for merging.

Hopefully in this week.

@tisonkun
Copy link
Contributor Author

tisonkun commented Nov 17, 2024

This PR is ready for merge from my perspective.

Ping @programmerjake @kennytm @Amanieu if you have time to give it a review :D

I'd still wonder whether we'd add a String::into_char_indices method also? Seems intuitive and for symmetry.

@rust-log-analyzer

This comment has been minimized.

@thaliaarchi
Copy link
Contributor

thaliaarchi commented Nov 17, 2024

Looks good, aside from some small nits:

  • unsafe { self.bytes.advance_by(offset).unwrap_unchecked() } and let _ = self.bytes.advance_back_by(len - idx) should be consistent. I think ignoring with _ is better.
  • Looks like iter and size_hint are missing #[inline].
  • The crate::vec imports can be simplified from two lines to use crate::vec::{self, Vec} to match the other imports.

@tisonkun
Copy link
Contributor Author

@thaliaarchi Thank you! Comments addressed.

@rust-log-analyzer

This comment has been minimized.

@tisonkun tisonkun requested a review from Amanieu November 25, 2024 03:00
library/alloc/src/string.rs Outdated Show resolved Hide resolved
library/alloc/src/string.rs Outdated Show resolved Hide resolved
@Amanieu Amanieu self-assigned this Nov 30, 2024
@tisonkun tisonkun requested a review from Amanieu December 2, 2024 02:05
@tisonkun
Copy link
Contributor Author

tisonkun commented Dec 2, 2024

@Amanieu Thanks for your review! Comments addressed.

@tisonkun
Copy link
Contributor Author

ping @Amanieu @kennytm :D

@tisonkun
Copy link
Contributor Author

@Amanieu shall I squash commits by myself? Or the automation could do squash and merge.

@slanterns
Copy link
Contributor

slanterns commented Dec 22, 2024

bors squash does not quite work (and was removed iirc), usually you need to do squash by yourself.

@thaliaarchi
Copy link
Contributor

Would IntoChars::into_string(self) -> String be useful? Then the conversion can be bidirectional, keeping the allocation.

@tisonkun
Copy link
Contributor Author

tisonkun commented Dec 22, 2024

@slanterns manually squashed. cc @Amanieu

Would IntoChars::into_string(self) -> String be useful? Then the conversion can be bidirectional, keeping the allocation.

@thaliaarchi we currently store string.into_bytes().into_iter() as the field. It doesn't seems we have pub method to transmute it back directly.

@programmerjake
Copy link
Member

@slanterns manually squashed. cc @Amanieu

Would IntoChars::into_string(self) -> String be useful? Then the conversion can be bidirectional, keeping the allocation.

@thaliaarchi we currently store string.into_bytes().into_iter() as the field. It doesn't seems we have pub method to transmute it back directly.

if you were to add a into_string() method, you could use:

impl IntoChars {
    pub fn into_string(self) -> String {
        // Safety: `bytes` are kept in UTF-8 form, only removing whole `char`s at a time.
        unsafe { String::from_utf8_unchecked(self.bytes.collect()) }
    }
}

@programmerjake
Copy link
Member

There could also be a fn IntoChars::into_bytes(self) -> alloc::vec::IntoIter<u8> (not returning Vec because that requires moving the contents back to the beginning if next() has already been called, for into_string() that's ok because we don't really have another type we could convert into since IntoChars is already the String equivalent of vec::IntoIter<u8>).

@tisonkun
Copy link
Contributor Author

String::from_utf8_unchecked(self.bytes.collect())

This actually collect the rest chars into a new string, instead of transmute. I may leave it to the other PR for the tracking issue and wait a bit for other feedback if it's necessary.

Somehow you can already call:

let mut it = string.into_chars();
// .. 
let new_string = it.collect::<String>();

@tisonkun
Copy link
Contributor Author

tisonkun commented Dec 22, 2024

This actually collect the rest chars into a new string, instead of transmute

I found SpecFromIter that provides a fast path if the iter never used now.

But I found this PR gets pending for quite a bit time so I'm hesitate to add new nice to have which cause extra review cycle, unless there is a signal it's to be converged.

If the review cycle is timely, making several PRs should be workable also.

@tisonkun
Copy link
Contributor Author

Updated since it's minor and there is a reason for doing so. Sorry I get a few frustrated that it seems often take several months to move forward an PR for accepted proposal (#114788, #123600) But if it's the normal case, then I may set the correspondingly expectation.

@tisonkun
Copy link
Contributor Author

I still hope that the squash process can be automated, or otherwise it's not easy to check the diff and even the author need the history for context.

@rust-log-analyzer

This comment has been minimized.

@programmerjake
Copy link
Member

iirc squashing is not a requirement, so if you feel you're losing important commit history, just don't squash

@tisonkun
Copy link
Contributor Author

@programmerjake Thanks for your information! I was told to do this sometime and not sure if there is a consensus or reference.

@programmerjake
Copy link
Member

programmerjake commented Dec 22, 2024

Sorry I get a few frustrated that it seems often take several months to move forward an PR for accepted proposal (#114788, #123600) But if it's the normal case, then I may set the correspondingly expectation.

you don't need to feel like I'm telling you what you must do or anything, I'm not one of the people who can approve PRs for this repo anyway so my reviews are merely advisory (I only have that privilege for the portable-simd repos).

That said, PRs can normally take more than a month and also this is holiday season (so people may be on vacation and those that aren't may have reduced capacity) and is right before the 2024 edition comes out so taking longer than usual is expected.

@rust-log-analyzer

This comment has been minimized.

@programmerjake
Copy link
Member

@programmerjake Thanks for your information! I was told to do this sometime and not sure if there is a consensus or reference.

I've seen a number of PRs that were merged with multiple commits (e.g. just looking through git log I found #134601 which was merged with 2 separate commits), so my assumption is that's fine. I might rebase to squash commits that weren't useful to have separate (e.g. 5 commits in a row fixing typos could be 1 commit) but if the commits are useful to have separate, then afaict keep them separate if you prefer.

@rust-log-analyzer

This comment has been minimized.

@programmerjake

This comment was marked as resolved.

Signed-off-by: tison <wander4096@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants