-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking Issue for os_str_bytes #111544
Comments
Allow limited access to `OsStr` bytes `OsStr` has historically kept its implementation details private out of concern for locking us into a specific encoding on Windows. This is an alternative to rust-lang#95290 which proposed specifying the encoding on Windows. Instead, this only specifies that for cross-platform code, `OsStr`'s encoding is a superset of UTF-8 and defines rules for safely interacting with it At minimum, this can greatly simplify the `os_str_bytes` crate and every arg parser that interacts with `OsStr` directly (which is most of those that support invalid UTF-8). Tracking issue: rust-lang#111544
Allow limited access to `OsStr` bytes `OsStr` has historically kept its implementation details private out of concern for locking us into a specific encoding on Windows. This is an alternative to rust-lang#95290 which proposed specifying the encoding on Windows. Instead, this only specifies that for cross-platform code, `OsStr`'s encoding is a superset of UTF-8 and defines rules for safely interacting with it At minimum, this can greatly simplify the `os_str_bytes` crate and every arg parser that interacts with `OsStr` directly (which is most of those that support invalid UTF-8). Tracking issue: rust-lang#111544
use as_os_str_bytes Make use of the new operations recently added (tracking issue: rust-lang/rust#111544). At least the "host OsStr to target bytes" direction now works even for non-utf-8 strings on all hosts!
@RalfJung since you are one of the few to have used this in practice, any feedback? Any thoughts on how well the name works? |
🤷 no strong opinion on the name. It would have been nice to also have a safe function in the other direction; Miri still needs |
About naming, I opened #113106 and it contains this snippet: let self_len = self.as_os_str().len();
let self_bytes = self.as_os_str().as_os_str_bytes(); Having the type name inside of the method usually leads to repetition, I'd suggest Well, assuming that I agree with @RalfJung, a |
The intention is to specify the encoding within the function name, much like utf8 and utf16 functions have the encoding in their name. The main difference is the encoding is opaque. One idea was
We can't take breaking changes :). Unnsure if there are workarounds to avoid it being a breaking change. However, we would want things to be parallel with both directions.
The question is if stabilization of what exists should be blocked on this? First, we'd need the ability to validating WTF-8 encoding which I don't think we have atm. Second, we'd have to decide on what the error strategy is, whether |
is it actually a breaking change if code calls a different function but with the exact same signature and the exact same effect? seems to me that it wouldn't be breaking since code won't behave differently at all. |
I vote for not blocking because I'd love to see this stabilized, and the checked encoding would certainly delay it. |
use as_os_str_bytes Make use of the new operations recently added (tracking issue: rust-lang#111544). At least the "host OsStr to target bytes" direction now works even for non-utf-8 strings on all hosts!
Based on my work to integrate this into the "os_str_bytes" crate, these are my observations:
|
This extends rust-lang#109698 to allow no-cost conversion between `Vec<u8>` and `OsString` as suggested in feedback from `os_str_bytes` crate in rust-lang#111544.
@dylni I went ahead and created #113442 for The reason for differentiating |
@epage Thanks! For the second point, the problem comes when implementing checked indexing. On Windows, this requires finding a UTF-8 sequence before or after the index, which is slightly inefficient. This work could be avoided on platforms that support However, it may be better to share the implementation between platforms, even if it doesn't cause a safety issue on Unix. I'll start with that implementation for now, since it does make the crate more consistent. |
imo it would be fine for |
Allow limited access to `OsString` bytes This extends rust-lang#109698 to allow no-cost conversion between `Vec<u8>` and `OsString` as suggested in feedback from `os_str_bytes` crate in rust-lang#111544.
Allow limited access to `OsString` bytes This extends rust-lang#109698 to allow no-cost conversion between `Vec<u8>` and `OsString` as suggested in feedback from `os_str_bytes` crate in rust-lang#111544.
Allow limited access to `OsString` bytes This extends rust-lang#109698 to allow no-cost conversion between `Vec<u8>` and `OsString` as suggested in feedback from `os_str_bytes` crate in rust-lang#111544.
@dylni |
@epage Thanks for the note! I started using it in the crate, and I'll be publishing it under a feature once I complete some more testing. I'll update this ticket if I see any other issues. |
With @dylni (author of The only open question is the name we give this (e.g. |
(I assume you mean I went with That said, I do not feel strongly about this. |
Silly idea: Make |
that's a breaking change since people could be calling it like |
trait OsStrExt {
fn as_bytes(&self) -> &[u8] {
// something that gets specifically the as_bytes from the inherent implementation
// I have absolutely no idea how to do that. `OsStr::as_bytes` just calls this function
}
...
}
impl OsStrExt for OsStr{}
impl OsStr {
fn as_bytes(&self) -> &[u8] {
// The actual function
}
} Assuming this is at all possible, it should fix that Edit: I should clarify this doesn't seem to produce compile-time name conflicts or whatever it's called. It just also seems to be impossible to use the |
Regarding the name, shortening it is probably undesirable, as the bytes need special handling. For example, combining two byte sequences would create another invalid sequence in some cases. It would ideally be rare that non-libraries use the methods. I agree with @epage that using |
We discussed this in the libs-api meeting yesterday. We prefer the We're also happy to stabilize this along with the name change. @rfcbot fcp merge |
Team member @Amanieu has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. This will be merged soon. |
feat(std): Stabilize 'os_str_bytes' feature Closes rust-lang#111544
feat(std): Stabilize 'os_str_bytes' feature Closes rust-lang#111544
Feature gate:
#![feature(os_str_bytes)]
This is a tracking issue for cross-platform access to the underling bytes (
&[u8]
) for&OsStr
by defining it as an unspecified superset of UTF-8.Assumptions:
from_os_str_bytes_unchecked
are not a blocker for stablizationPublic API
Steps / History
OsStr
bytes #109698, Allow limited access toOsString
bytes #113442Unresolved Questions
os_str_bytes
encoded_bytes
was another name that was brought upFootnotes
https://std-dev-guide.rust-lang.org/feature-lifecycle/stabilization.html ↩
The text was updated successfully, but these errors were encountered: