-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Characters garbled from sink() on Windows #59
Comments
I found this issue on investigating hadley/emo#7. Emojis still fail to keep their characters with sink_test = function(locale = 'English') {
Sys.setlocale(, locale)
x = emo::ji('japanese_goblin')
y = character()
con = textConnection('y', local = TRUE, open = 'wr')
sink(con)
print(x)
sink()
y
}
#> [1] "<f0><U+009F><U+0091><U+00BA> " Apparently, we need better output <- character(0L)
outputCon <- textConnection('output', 'wr')
writeLines(emo::ji('japanese_goblin'), outputCon, useBytes = TRUE)
close(outputCon)
output
#> [1] "村"
`Encoding<-`(output, 'UTF-8')
#> [1] "\xf0\u009f\u0091�"
cat(`Encoding<-`(output, 'UTF-8'))
#> 👺 |
I think base R needs better support for UTF-8. I'm counting on @krlmlr to save the world: http://r.789695.n4.nabble.com/source-parse-and-foreign-UTF-8-characters-td4733523.html |
Working on it with @dmurdoch ;-) |
Oh, @krlmlr, you are always our UTF-8 hero! Cool. Thanks for the information 👍 |
Not sure but perhaps this is also related tidyverse/readr#884 |
No, I'm quite sure it's not. In that case, R does things right, but boost won't :( |
FWIW I filed a bug report with R and unfortunately it sounds like it will be too expensive for them to fix: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17503 |
Thanks @kevinushey! Then I wonder if it is possible to write a custom connection that supports UTF-8 instead of the native encoding. I have no idea about how connections in R work, but I remember Simon Urbanek gave a talk in 2013, in which he showed a custom connection based on 0MQ: https://github.com/s-u/zmqc |
It seems that strings are translated by r-base into native even before they reach the connection. Perhaps we really require a fix in base for Perhaps Windows will support UTF-8 as native encoding at some point. The "April 2018 insider build" of Windows seems to have some of it: https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8 |
I see. If base R does the translation, I guess there is nothing we can do about it. That is really unfortunate... |
Included a remark to r-lib/evaluate#59 on behalf the scrambled => unicode character on windows
Closing since recent R should handle this much better on windows. If it's still a problem for folks, please let me know and we can try implementing something like r-lib/testthat#1693. |
Some examples:
Originally reported at http://stackoverflow.com/q/34096239/559676
With only
sink()
andtextConnection()
:The problem with this reduced example is only the wrong encoding marked:
The text was updated successfully, but these errors were encountered: