-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debug printing of combining characters is wrong #41922
Comments
cc @tbu- who made the change to debug printing and @alexcrichton who approved it |
Python seems to do the same thing.
|
@clarcharr That is, do you know some implementation we could copy? |
@tbu- not that I can think of; the current way seems wrong, though. Perhaps we could just check if a character is within the combining character range? I found this and it probably could help: http://stackoverflow.com/a/17052803 Perhaps we could make a similar script? |
Also got some help on Twitter for this: |
I would not consider this a bug, as it's common behavior to not touch or change Unicode characters when printed out to stdout or a file, specially when it's for debug mode. One reason to not do this is the fact that it can easily mislead the user. Let's say I got the output and copy-pasted the output in a Unicode decoder, to see what character we have in the spot. I will see two codepoints in the decoder, one of which had not existed in the original string. So, IMHO, there are pros in doing so, specially nicer-looking output, but the main con being the Debug output not telling you the truth, which is very unfortunate, specially since there will be almost no work around it! I think it's better to keep these fancy features for the high-level parts of a stack, like If Rust wants to do anything special about these characters, the filter would be That said, I think we also need to take a look at what other modern Unicode-savvy languages, like Swift, are doing in this area, before making a decision. |
For reference, in Swift: let str = "e\u{301}";
// Array of unicode scalars, equivalent to Rust's chars
print("\(Array(str.unicodeScalars))"); // ["e", "\u{0301}"]
// Array of unicode scalars converted into strings
print("\(Array(str.unicodeScalars).map({ String.init($0) }))"); // ["e", "́"] Swift opts to print code points for unicode scalars (but when converted to strings they display as in Rust). |
This seems to have been deliberately changed to the current output as a result of #24588. |
I still just think that checking if the character is combining and then escaping if it's by itself is the best option. |
Oh, I see: you already mentioned the earlier change! I agree: this would make sense for combining characters. The range described on Wikipedia should probably be sufficient? |
Escape combining characters in char::Debug Although combining characters are technically printable, they make little sense to print on their own with `Debug`: it'd be better to escape them like non-printable characters. This is a breaking change, but I imagine the fact `escape_debug` is rare and almost certainly primarily used for debugging that this is an acceptable change. Resolves #41922. r? @alexcrichton cc @clarcharr
Minimal example:
(playground link)
Expected output is either:
Or:
Actual output:
Note that the combining accent prints over the single quote. This is confusing and shouldn't happen.
The text was updated successfully, but these errors were encountered: