Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TryFrom<&str> for [char; N]. #93549

Closed

Conversation

richard-uk1
Copy link
Contributor

This patch adds the ability to create an array of chars from a str similarly to how you can do this with [T] -> [T; N] (where T: Copy). The implementation is slightly more involved than for the slice case because you can't simply memcpy the bytes. If you accept this patch, I'd appreciate someone looking over the unsafe code to check soundness.

@rust-highfive
Copy link
Collaborator

r? @scottmcm

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 1, 2022
@rust-log-analyzer

This comment has been minimized.

@scottmcm scottmcm added the needs-fcp This change is insta-stable, so needs a completed FCP to proceed. label Feb 1, 2022
@scottmcm
Copy link
Member

scottmcm commented Feb 1, 2022

Does this need any new unsafe code? To me it looks like it's essentially this

#![feature(array_from_fn)]
pub fn str_to_char_array<const N: usize>(s: &str) -> Option<[char; N]> {
    let mut c = s.chars();
    let a = std::array::try_from_fn(|_| c.next())?;
    if c.next().is_none() {
        Some(a)
    } else {
        None
    }
}

Or, internally, it could just use

fn try_collect_into_array<I, T, R, const N: usize>(iter: &mut I) -> Option<R::TryType>

But as a meta-point, have you had any conversation with libs folks about this? Because impls need to go in insta-stable, they need extra good motivation and certainty because we don't have an experimentation opportunity with them.

Because of the variable-width encoding in strs, it feels like this is substantially less likely to be useful than the ones like &[T; N]: TryFrom<&[T]>.

And it's not obvious what the error type should be, to me. This has done a bunch of UTF-8 decoding work where it's not obvious that it should be thrown away -- kinda like how https://doc.rust-lang.org/nightly/std/str/struct.Utf8Error.html#method.valid_up_to exists -- so maybe it wants to be a complicated error type than can return partial results, or something.

So overall, this feels way more like it wants to be ArrayVec::from_iter instead of TryFrom, to me.

(Or that it wants a type to fill in the "array is to slice as ???? is to str" problem.)

@scottmcm scottmcm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 2, 2022
@richard-uk1
Copy link
Contributor Author

richard-uk1 commented Feb 2, 2022

Hey @scottmcm thanks for the response! I basically agree with all of what you say. The reason I did it the way I did rather than using try_collect_into_array was I was worried that the compiler would be able to optimize the code as well, but I haven't checked this. I did also look at implementing a more generic version for collecting iterators, but hit a problem with impl overlap with impl<A, E, V> FromIterator<Result<A, E>> for Result<V, E> where V: FromIterator<A>. If you have a suggested way around this (possibly using specialization within libstd if that's allowed), I'd like to implement that as well as (or instead of) this.

I'd like to have a discussion about this with the libs team. Is there a canonical location for chatting (irlo, Zulip)? I'm sorry I'm not super familiar with how rustc dev works (but would like to be more familiar).

EDIT

I should say more about the impl overlap. It occurs because we want to gather into a Result, dependent on whether the iterator yielded the correct number of elements. I agree there is a lot of design work: should it error if there are extra elements, rather than just too few? (I think yes), should it work with the existing early-return behaviour impl (the one that it clashes with) (again I think it must), how detailed/complex/large should the error type be (I would argue for small sized because collecting into an array rather than a Vec is something do you for performance reasons). All needs discussion.

@scottmcm
Copy link
Member

scottmcm commented Feb 2, 2022

I would suggest zulip, yes, for feedback from libs-api team members: https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Request.20review.20for.20.2391589/near/270429471 . IRLO is good for broader feedback from more people.

As for impl FromIterator for Result, I'm quite happy for this not to go through that approach. Those have always been difficult when it comes to discoverability, and I don't think it simplifies any of the questions I raised above.

cc rust-lang/rfcs#2990 which wanted to add ArrayVec to core, though it looks like it didn't get traction.

@workingjubilee
Copy link
Member

The motivation question still is open, in any case. It is a potentially quite expensive and complicated conversion to attempt on a sufficiently long string, and one usually better attended to by iteration. What does having the [char; N] offer that str.chars() does not? It is not the ability to operate on individual visible characters ("grapheme clusters"), to be sure, as individual rendered characters may include multiple codepoints.

@richard-uk1
Copy link
Contributor Author

richard-uk1 commented Feb 3, 2022

@workingjubilee The reason I reached for it is because I was making a wordle clone, and wanted it to work with non-ascii characters. The reason I thought it might belong in libstd was because of its symmetry with &[T] -> [T; N]. I think your point about char =/= visible glyph is an important one, and also whether str -> [char; N] really is symmetric to [u8] -> [u8; N].

@scottmcm
Copy link
Member

scottmcm commented Feb 3, 2022

Note that if you're doing wordle (or programming competition-style stuff), you can just use .as_bytes().try_into() and deal in [u8; 5] instead of chars -- especially with b"arose" and such.

I'm just always super-skeptical of any constant-length stuff when it comes to strings in general, because of things like how "noël" and "noël" are different lengths.

@richard-uk1
Copy link
Contributor Author

@scottmcm

I'm just always super-skeptical of any constant-length stuff when it comes to strings in general, because of things like how "noël" and "noël" are different lengths.

Yep I think I've been convinced that this PR is misguided - I'll close it.

As for impl FromIterator for Result, I'm quite happy for this not to go through that approach. Those have always been difficult when it comes to discoverability, and I don't think it simplifies any of the questions I raised above.

Do you have another approach in mind? This is something I'd love to get into std because it's something I would use.

@scottmcm
Copy link
Member

scottmcm commented Feb 3, 2022

Do you have another approach in mind?

I would suggest doing it via ArrayVec: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=d0959ca49e3022d741fa741de48864b2

let av: ArrayVec<char, 10> = "hello".chars().collect();

That way it's still no-allocations-needed, and does a good job of dealing with different lengths.

(I personally want something like arrayvec in core, but there are many questions there.)

@richard-uk1 richard-uk1 closed this Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-fcp This change is insta-stable, so needs a completed FCP to proceed. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants