-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have read_csv and friends understand international characters #892
Comments
I believe this is already fixed by #730 (not released yet, though). Currently, you need to specify readr::read_csv(file="x\næøå", locale = readr::locale(encoding = "latin1")) |
Your example works on my system too with > readr::read_csv(file="x\næøå", locale = readr::locale(encoding = "latin1"))
# A tibble: 1 x 1
x
<chr>
1 æøå
> readr::read_csv(file="x\næøå", locale = readr::locale(encoding = "UTF-8"))
# A tibble: 1 x 1
x
<chr>
1 "\xe6\xf8\xe5" I was hoping to steer my students towards readr as an alternative to base R with fewer surprises and less hassle, but internationalization seems to be tricky (cf also #884). |
Hmm, I may be confused about my own locale... Setting encoding="UTF-8" throws even > read.table(textConnection("x\næøå"), header=TRUE, stringsAsFactors = FALSE, encoding = "latin-1")
x
1 æøå
> read.table(textConnection("x\næøå"), header=TRUE, stringsAsFactors = FALSE, encoding = "UTF-8")
x
1 <e6><f8><e5> On second reading of my session_info() above, I see > locale()
<locale>
Numbers: 123,456.78
Formats: %AD / %AT
Timezone: UTC
Encoding: UTF-8
<date_names>
Days: Sunday (Sun), Monday (Mon), Tuesday (Tue), Wednesday (Wed), Thursday (Thu), Friday (Fri), Saturday (Sat)
Months: January (Jan), February (Feb), March (Mar), April (Apr), May (May), June (Jun), July (Jul), August (Aug), September (Sep), October (Oct), November (Nov), December
(Dec)
AM/PM: AM/PM Specifying UTF-8 when writing and reading also seems to work: > write.table(data_frame(x="æøå"), file="test.txt", fileEncoding = "UTF-8")
> read.table("test.txt")
x
1 æøå
> read.table("test.txt", encoding = "UTF-8")
x
1 æøå In any case, this illustrates exactly the kind of stuff I have always had trouble understanding, and was hoping that UTF-8 would free me from. Any suggestions on how I can set my locale or other settings/environment variables to have this "just work" (i.e. never having to specify encoding in my own code) would be highly appreciated. |
[yutannihilation]
I forgot to say that I appreciate this! Maybe my confusion will be irrelevant soon 8-) |
Your default locale is That's the tragedy; everything you type on R console is encoded in
I wish so, but, unfortunately, the release seems in trouble for months... |
As far as I can tell this is fixed in the devel version of readr. |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
Modern R handles UTF-8 characters well, including the Norwegian
æøå
. However,read_csv
returns atibble
which displays these characters as unicode escape codes. This is ugly, e.g. when knitting international rmarkdown documents.read.table handles international input without problems, and the result can be converted to a tibble without the ugly escaping. I wish
read_csv
and other readr functions would behave similarly.Small reproducible example:
The text was updated successfully, but these errors were encountered: