-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reprex output does not have the right encoding on Windows #82
Comments
This should be fixed by #76 |
Looks good to me x <- "fa\xE7ile"
Encoding(x)
#> [1] "latin1"
x
#> [1] "façile"
xx <- iconv(x, "latin1", "UTF-8")
Encoding(c(x, xx))
#> [1] "latin1" "UTF-8"
c(x, xx)
#> [1] "façile" "façile" |
Err, not quite. I am assuming that the output from Well, "Brüssel"
#> [1] "Brüssel" But with "Brüssel"
#> [1] "Brüssel" Again with "façile"
#> [1] "façile"
"fa\xE7ile"
#> [1] "façile" Actually, with > "fa\xE7ile"
[1] "façile" |
Thanks, and sorry for my imperfect PR to fix this... I've got a bit different results, which are similarly problematic. Due to my locale, # this result is copy and paste from console
"fa\xe7ile"
#> [1] "fa輅le" So I use this one instead: # this result is copy and paste from console
"fa\u00E7ile"
#> [1] "façile" Then, I got the following results both for "fa\u00E7ile"
#> [1] "facile" Note that, when I type # this result is copy and paste from console
"ç"
#> [1] "c" So, the reason the result above is not |
This seems most likely to be an encoding issue with the RStudio API. Hopefully @kevinushey will have some ideas |
@yutannihilation: Is the following from "fa\u00E7ile"
#> [1] "façile" I am just asking because the This "fa\xe7ile"
#> [1] "fa輅le" might actually be correct IMHO, assuming that What does |
Ah, sorry for confusing you! I copy and paste from my console and edit it here.
No, charToRaw("輅")
#> [1] e7 69
charToRaw("i")
#> [1] 69 I got the following for "Brテシssel"
#> [1] "Brテシssel" |
I guess we can blame the difference between |
Good news, RStudio API works fine for me. (but not for @dpprdan?) Here is the result when I copied/selected the string # these results are copied and pasted from console
readLines("clipboard")
#> [1] "\"fa\\xe7ile\""
#> Warning message:
#> In readLines("clipboard") : incomplete final line found on 'clipboard'
rstudioapi::getSourceEditorContext()
#> Document Context:
#> - id: '332A20F5'
#> - path: ''
#> - contents: <1 rows>
#> Document Selection:
#> - [1, 1] -- [1, 12]: '"fa\\xe7ile"' |
I recall that the If that's the case, manually fixing up the encoded text with e.g. |
@kevinushey so it's save to assume the API always returns UTF-8 text? |
That's right -- the |
@dpprdan I've misunderstood your comment, sorry. Let me clarify. Here are some possible ways of do "reprex"-fu:
Method 1. and 3. work fine because they pass the text So your choice can be method 1. or 3.. Or, alternatively, # input needs line breaks to distinguish texts from filenames
reprex::reprex(input = sprintf("%s\n", "\"fa\\xe7ile\"")) |
To sum up: readLines("clipboard")
# [1] "\"Brüssel\""
rstudioapi::getSourceEditorContext()
# Document Context:
# - id: '8EA39DCA'
# - path: ''
# - contents: <1 rows>
# Document Selection:
# - [1, 1] -- [1, 10]: '"Brüssel"' I guess this is what @kevinushey was referring to? So with ctx <- rstudioapi::getSourceEditorContext()
Encoding(ctx$selection[[1]]$text) <- "UTF-8"
ctx$selection
# Document Selection:
# - [1, 1] -- [1, 10]: '"Brüssel"' this could be fixed in |
I've got the same result for
Sounds fair to me. Marking the encoding of UTF-8 string as UTF-8 is safe no matter it is already marked as UTF-8 or not. Note that we can safely assume the string passed from RStudio is always UTF-8 since originally it is passed as JSON, where the character encoding is supposed to be UTF-8: https://github.com/rstudio/rstudio/blob/600d2adf687cec0034bd63ff739bbc0f6acba348/src/cpp/session/modules/SessionWorkbench.cpp#L84-L100 So I guess this should be fixed in the very upstream, RStudio itself. Until the day comes, let's set |
Thanks for fixing! Just for future reference, people in MBCS locale like me may still fail to render some characters such as "Brussel"
#> [1] "Brussel" But this is not up to reprex package. So I'm fine for the fix :) |
I hate to say it, but I found something else (which I believe belongs here as well). Source x <- c("€", "–", "¼", "⅛", "℅", "‰", "Malmö")
Encoding(x)
print(x)
Console output > x <- c("€", "–", "¼", "⅛", "℅", "‰", "Malmö")
> Encoding(x)
[1] "latin1" "latin1" "latin1" "UTF-8" "UTF-8" "latin1" "latin1"
> print(x)
[1] "€" "–" "¼" "⅛" "℅" "‰" "Malmö"
x <- c("€", "–", "¼", "?", "?", "‰", "Malmö")
Encoding(x)
#> [1] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
print(x)
#> [1] "<U+0080>" "<U+0096>" "¼" "?" "?" "<U+0089>" "Malmö"
x <- c("€", "–", "¼", "<U+215B>", "<U+2105>", "‰", "Malmö")
Encoding(x)
#> [1] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
print(x)
#> [1] "<U+0080>" "<U+0096>" "¼" "<U+215B>" "<U+2105>" "<U+0089>"
#> [7] "Malmö" Version reprex@3960cc7 |
This is at least somewhat related to yihui/knitr#1415, which is about knit's reporting of encoding. |
Not really sure. This rmd source (the ```{r}
x <- c("€", "–", "¼", "⅛", "℅", "‰", "ö")
Encoding(x)
```
#> [1] "latin1" "latin1" "latin1" "UTF-8" "UTF-8" "latin1" "latin1"
```{r}
print(x)
```
#> [1] "€" "–" "¼" "⅛" "℅" "‰" "ö"
results in this markdown (via Rstudio > knit (w/
|
reprex
's output does not have the right encoding on Windows 10 (i.e. it should be declared as UTF-8).This is the source I am passing to
reprex()
This is how
reprex
renders it:But this is what I actually see on my console
So, once I do this after passing the source to reprex()
I get
Session info
The text was updated successfully, but these errors were encountered: