Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inaccurate {Path,OsStr}::to_string_lossy() documentation #129963

Merged
merged 1 commit into from
Sep 6, 2024

Conversation

rjooske
Copy link
Contributor

@rjooske rjooske commented Sep 4, 2024

The documentation of Path::to_string_lossy() and OsStr::to_string_lossy() says the following:

Any non-Unicode sequences are replaced with U+FFFD REPLACEMENT CHARACTER

which didn't immediately make sense to me. ("non-Unicode sequences"?)
Since both to_string_lossy functions eventually become just a call to String::from_utf8_lossy, I believe the documentation meant to say:

Any non-UTF-8 sequences are replaced with U+FFFD REPLACEMENT CHARACTER

This PR corrects this mistake in the documentation.

For the record, a similar quote can be found in the documentation of String::from_utf8_lossy:

... During this conversion, from_utf8_lossy() will replace any invalid UTF-8 sequences with U+FFFD REPLACEMENT CHARACTER, ...

@rustbot
Copy link
Collaborator

rustbot commented Sep 4, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @workingjubilee (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 4, 2024
@workingjubilee
Copy link
Member

Thanks!

@bors r+

@bors
Copy link
Collaborator

bors commented Sep 5, 2024

📌 Commit 49a93df has been approved by workingjubilee

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 5, 2024
@workingjubilee
Copy link
Member

@bors rollup

@ollie27
Copy link
Member

ollie27 commented Sep 5, 2024

On Windows it's checking for a UTF-16 sequence not UTF-8.

@workingjubilee
Copy link
Member

...hm.

@bors r-

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Sep 5, 2024
@workingjubilee
Copy link
Member

@ollie27 I'm not sure whether this change isn't more correct though, since we're trying to reencode whatever the path is as UTF-8, and we only do this replacement for things that can't be otherwise encoded as UTF-8.

@workingjubilee
Copy link
Member

Perhaps it's still not perfectly correct since it should reference the notion of the possible transcoding. But we're not actually checking for UTF-16, at least not as commonly understood, because we're rejecting invalid surrogate pairings, and everything that says it works with "UTF-16" actually also deals in such nonsense. Referencing Unicode ambiguously confuses this matter.

@ChrisDenton
Copy link
Member

There's no re-encoding going on in to_string_lossy. It's essentially the same as to_str on valid UTF-8.

On Windows, OsStr is WTF-8 (a superset of UTF-8). Special handling for surrogates only happens when joining or splitting. As far as to_string_lossy is concerned it's either valid UTF-8 or it's not. Which is why to_str only mentions doing a validity check.

@ollie27
Copy link
Member

ollie27 commented Sep 6, 2024

Actually, yeah, #111544 exposed OsStr as a superset of UTF-8 bytes even on Windows so it does make sense to talk about non-UTF-8 sequences here even though the example shows using invalid UTF-16.

Sorry for the noise.

@workingjubilee
Copy link
Member

@ollie27 No problem. :^)

There's no re-encoding going on in to_string_lossy.

Correct, I was talking somewhat imprecisely about the "total journey" of some bytes to an OsString to then to_string_lossy which potentially does.

Anyway, it sounds like we're back to a consensus that we should accept this even if it probably could use an even-better wording that is not leaping right into anyone's brains right now or we'd have suggested it. SO!

@bors r+

@bors
Copy link
Collaborator

bors commented Sep 6, 2024

📌 Commit 49a93df has been approved by workingjubilee

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 6, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 6, 2024
…iaskrgr

Rollup of 6 pull requests

Successful merges:

 - rust-lang#129021 (Check WF of source type's signature on fn pointer cast)
 - rust-lang#129781 (Make `./x.py <cmd> compiler/<crate>` aware of the crate's features)
 - rust-lang#129963 (Inaccurate `{Path,OsStr}::to_string_lossy()` documentation)
 - rust-lang#129969 (Make `Ty::boxed_ty` return an `Option`)
 - rust-lang#129995 (Remove wasm32-wasip2's tier 2 status from release notes)
 - rust-lang#130013 (coverage: Count await when the Future is immediately ready )

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 45d6957 into rust-lang:master Sep 6, 2024
6 checks passed
@rustbot rustbot added this to the 1.83.0 milestone Sep 6, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Sep 6, 2024
Rollup merge of rust-lang#129963 - rjooske:fix/inaccurate_to_string_lossy_doc, r=workingjubilee

Inaccurate `{Path,OsStr}::to_string_lossy()` documentation

The documentation of `Path::to_string_lossy()` and `OsStr::to_string_lossy()` says the following:
> Any non-Unicode sequences are replaced with `U+FFFD REPLACEMENT CHARACTER`

which didn't immediately make sense to me. ("non-Unicode sequences"?)
Since both `to_string_lossy` functions eventually become just a call to `String::from_utf8_lossy`, I believe the documentation meant to say:
> Any *non-UTF-8* sequences are replaced with `U+FFFD REPLACEMENT CHARACTER`

This PR corrects this mistake in the documentation.

For the record, a similar quote can be found in the documentation of `String::from_utf8_lossy`:
> ... During this conversion, `from_utf8_lossy()` will replace any invalid UTF-8 sequences with `U+FFFD REPLACEMENT CHARACTER`, ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants