-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_csv()
cannot handle file paths with characters outside of the default locale (Windows)
#884
Comments
Hmmmm, it seems impossible...? |
Another reproducible example below. read.table() can handle filenames with Norwegian characters, but read_tsv cannot. write.table(head(iris, 2), file="æøå.txt")
read.table("æøå.txt")
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
readr::read_tsv("æøå.txt")
## Error in guess_header_(datasource, tokenizer, locale) :
## Cannot read file C:/.../æøå.txt: The system cannot find the file specified.
devtools::session_info()
Session info --------------------------------------------------------------------------------------------------
setting value
version R version 3.5.1 (2018-07-02)
system x86_64, mingw32
ui RStudio (1.1.456)
language (EN)
collate English_United States.1252
tz Europe/Berlin
date 2018-09-19
Packages ------------------------------------------------------------------------------------------------------
package * version date source
base * 3.5.1 2018-07-02 local
compiler 3.5.1 2018-07-02 local
crayon 1.3.4 2017-09-16 CRAN (R 3.5.1)
datasets * 3.5.1 2018-07-02 local
devtools 1.13.6 2018-06-27 CRAN (R 3.5.1)
digest 0.6.17 2018-09-12 CRAN (R 3.5.1)
graphics * 3.5.1 2018-07-02 local
grDevices * 3.5.1 2018-07-02 local
hms 0.4.2 2018-03-10 CRAN (R 3.5.1)
memoise 1.1.0 2017-04-21 CRAN (R 3.5.1)
methods * 3.5.1 2018-07-02 local
pillar 1.3.0 2018-07-14 CRAN (R 3.5.1)
pkgconfig 2.0.2 2018-08-16 CRAN (R 3.5.1)
R6 2.2.2 2017-06-17 CRAN (R 3.5.1)
Rcpp 0.12.18 2018-07-23 CRAN (R 3.5.1)
readr 1.1.1 2017-05-16 CRAN (R 3.5.1)
rlang 0.2.2 2018-08-16 CRAN (R 3.5.1)
rstudioapi 0.7 2017-09-07 CRAN (R 3.5.1)
stats * 3.5.1 2018-07-02 local
tibble 1.4.2 2018-01-22 CRAN (R 3.5.1)
tools 3.5.1 2018-07-02 local
utils * 3.5.1 2018-07-02 local
withr 2.1.2 2018-03-15 CRAN (R 3.5.1)
yaml 2.2.0 2018-07-25 CRAN (R 3.5.1) |
Don't know if it is related, but using Rterm in cmd I cannot even type æøå (I get `o+). Norwegian characters work fine at the cmd command line, it's only Rterm that is having problems. Maybe I could fix those with some setting, and maybe that would fix read_tsv too...? In Rgui, the example runs the same as inside RStudio. |
After 1 month of thinking, I'm coming to the conclusion that this is not possible directly as long as we rely on tmp <- tempfile(fileext = ".csv")
file.link("萼片长.csv", tmp)
readr::read_csv(tmp) If this looks good, I can send a PR... |
I think this is just a limitation of base R, you need to be able to represent the file paths in the current locale. |
FYI, this is wrong. R can handle paths that is not representable in the current locale (that's why write.csv(iris, "萼片长.csv", row.names = FALSE)
read.csv("萼片长.csv") I'm OK to close this as wontfix, but this is a limitation of Boost, not that of base R. (Sorry if it was a bit confusing that I explained that |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
(Originally reported here: #834 (comment))
On Windows, it seems
read_csv()
cannot handle the file path that contains character outside of the default locale.For example, in my locale, CP932 (Shift_JIS),
长
is not representable.For another example, in CP1252 (latin1), all of
萼
and片
and长
are not representable:In the defense of readr, base R also fails to handle them. For example,
saveRDS()
.But, I have no idea how boost can handle this, since they requires the file path in the native locale...
The text was updated successfully, but these errors were encountered: