-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement TryFrom<&OsStr>
for &str
#98202
Conversation
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
r? @thomcc (rust-highfive has picked a reviewer for you, use r? to override) |
We have a new process for unstable APIs, you need to file a change proposal first. Quoting the bot:
The implementation (and the API) look fine to me, but must go through the process. |
library/std/src/ffi/os_str.rs
Outdated
#[stable(feature = "str_tryfrom_osstr_impl", since = "1.64.0")] | ||
#[derive(Debug)] | ||
#[non_exhaustive] | ||
pub struct StrFromOsStrError {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of error information density, I wonder if we shouldn't set the bar higher. There is a lot more information in a UTF8Error
, and we already create and discard one on at least some platforms with the current implementation. Maybe we should skip relying on OsStr::to_str
and build something new on top of the more fundamental APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally considered that as well, but I think might make too many assumptions about the internal representation of OsStr
. Some future platform might choose a non-UTF-8 like encoding. On such a platform it may not make much sense to talk about specific byte locations of errors. It might not even on current platforms since (as far as I'm aware) there's no way to look at the byte representation of an OsStr
. Therefore I chose the more conservative approach at first.
However if the consensus is to implement it with Utf8Error
as the error type, I'd be open to implementing that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should still be an opaque struct like you've done here, and we should probably document that the internal representation is going to be inherently unstable and is kept intentionally opaque but we can still have that internally be a UTF8Error
when we're on unix and give nice utf8 related error messages, or we could do some windows specific wtf8 error about it containing surrogate pairs or w/e level of context we want to expose there, and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're pretty constrained in what OsStr
can be. The impl AsRef<OsStr> for str
implies that every valid str
must also be (bit-for-bit) a valid OsStr
. So if nothing else, surely this TryFrom
can report the valid UTF-8 span and the offset and length of the non-UTF8 span, without providing details of why a particular sequence of bytes is not UTF-8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. This basically means that an OsStr
must always have something like [u8]
as the underlying representation and therefore it talking about sub-spans of the string should always make sense.
So should the error type simply be Utf8Error
or some opaque error wrapping a Utf8Error
and exposing some of it's details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, did the Error API not get considered as part of the FCP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could extend this as further work. It seems very reasonable to either expose the utf8 error, or something equivalent.
That said, prior to it being available as bytes, doing so may leak details about our internal representation (on windows platforms for example I believe this would be the case), so personally I think we should wait until #95290 is more settled.
This also avoids needing to do further API team work on the design of the error type's API, which I think means it could just land as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(To be clear, I'm not suggesting that blocks landing this, just that it would block exposing information about in which way the UTF 8 bytes are invalid)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomcc No the error type was not part of the FCP. In retrospective it would have been useful to make that part of it. Do I understand your suggestion correctly that you want to keep the UTF8Error
type as the error type but with the methods on that type returning dummy values instead, so that users cannot rely on the internal representation? In that case that should probably be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think that would be good. I think exposing an opaque struct with no methods aside from trait impls would be better. I'm not t-libs-api though.
b7fe5dc
to
2a2b29b
Compare
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
I've updated the code to use the |
@thomcc With the FCP completed, what do you think of the current implementation? |
Impl looks fine. Unsure how settled we are on the error API, r=me if it doesn't need further discussion (certainly this would be my recommendation for now, since we can always expose more later). |
Reassigning yaahc's reviews as she has stepped down from the review rotation. r? rust-lang/libs-api |
I've marked this as waiting on team, due to the (unresolved?) question of the error type. Reusing |
We discussed this in the libs-api meeting today. We're happy to just use @rfcbot fcp merge |
Team member @Amanieu has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
2a2b29b
to
e3a1a11
Compare
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. This will be merged soon. |
@bors r+ rollup |
…iaskrgr Rollup of 6 pull requests Successful merges: - rust-lang#98202 (Implement `TryFrom<&OsStr>` for `&str`) - rust-lang#107619 (Specify behavior of HashSet::insert) - rust-lang#109814 (Stabilize String::leak) - rust-lang#111974 (Update runtime guarantee for `select_nth_unstable`) - rust-lang#112109 (Don't print unsupported split-debuginfo modes with `-Zunstable-options`) - rust-lang#112506 (Properly check associated consts for infer placeholders) r? `@ghost` `@rustbot` modify labels: rollup
Recently when trying to work with
&OsStr
I was surprised to find thisimpl
missing.Since the
to_str
method already existed the actual implementation is fairly non-controversial, except for maybe the choice of the error type. I chose an opaque error here instead of something likestd::str::Utf8Error
, since that would already make a number of assumption about the underlying implementation ofOsStr
.As this is a trait implementation, it is insta-stable, if I'm not mistaken?
Either way this will need an FCP.
I chose "1.64.0" as the version, since this is unlikely to land before the beta cut-off.
@rustbot modify labels: +T-libs-api
API Change Proposal: #99031 (accepted)