-
Notifications
You must be signed in to change notification settings - Fork 985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accented column names: current regression #1726
Comments
Could you provide reproducible example? I've tested on Ubuntu with the following code and was not able to reproduce issue, so maybe it is Windows related issue. Anyway reproducible example is important to address any issue. library(data.table)
dt=data.table(Année=1)
dt[,Année]
#[1] 1 |
Partial solution: the issue comes from the "encoding" parameter of A <- data.table::fread(input="Année;Mois\n2011;1", sep=";", encoding = "Latin-1")
A[, Année]`
# Error in eval(expr, envir, enclos) : object 'Année' not found
B <- data.table::fread(input="Année;Mois\n2011;1", sep=";", encoding = "unknown")
B[, Année]
# [1] 2011 It does not seem that "Latin-1" is a wrong value however: data.table::fread(input="Année;Mois\n2011;1", sep=";", encoding = "ISO-8859-1")
# Error in data.table::fread(input = "Année;Mois\n2011;1", sep = ";", encoding = "ISO-8859-1") :
# Argument 'encoding' must be 'unknown', 'UTF-8' or 'Latin-1'. Currently I've been circumventing the issue using "unknown". |
library(data.table)
A = data.table::fread("https://github.com/Rdatatable/data.table/files/298049/latin1.txt", sep=";", encoding="Latin-1")
A[, Année]
#Error in eval(expr, envir, enclos) : object 'Année' not found
Encoding(names(A))
#[1] "latin1" "unknown"
sessionInfo()
#R version 3.3.0 (2016-05-03)
#Platform: x86_64-pc-linux-gnu (64-bit)
#Running under: Ubuntu 15.10
#
#locale:
# [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
# [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
# [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
# [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
# [9] LC_ADDRESS=C LC_TELEPHONE=C
#[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
#
#attached base packages:
#[1] stats graphics grDevices utils datasets methods base
#
#other attached packages:
#[1] data.table_1.9.7
#
#loaded via a namespace (and not attached):
#[1] curl_0.9.7 Issue looks to be related to #1680 |
Thanks for catching and reporting this. I've not looked at the code, but I think my assumption that |
@arunsrinivasan |
@fabnicol why did you close this? |
Follow-up on this issue and related bug (in my opinion same cause). Reproducible examples:
|
I would advise, as this nagging issue seems not to be documented and is very annoying for non-English only coders, that a noticeable warning be issued in the official documentation, to the effect that Latin-1 bases should not have non-ASCII column names (but accented lines are OK).
I ususally turn things around in this (not ideal) way.
yields the expected |
I would say non-ascii names should be avoided in the first place, see #4351 |
I rather disagree with this. The point of this issue is that prior to commit c250e9f, accented column names were entirely OK. They are also OK, at least for Western latinate languages of the ISO-8859-1x family, with base R. So this cannot be an R problem, contrary to what is written in comments of issue #4351 |
@fabnicol Thanks for following up. Could you test if that yields expected results then? on the "OK" version dt[ , Année := 1L]
dt[ , "Année2" := 2L] AFAIU non-ascii names works in many places, but not in all. |
I'm using again my reproducible test in the post above, with R version 4.0.0 (2020-04-24) -- "Arbor Day" under W10. |
It seems that |
|
Issue is currently closed as bug is now fixed with R 4.0.2
|
I think it make sense to add a test for that. We can also escape that test for older versions of R. |
An interesting side issue is that with the current R-devel-win branch for Windows UTF-8, the issue remains if encoding of the table is Latin-1, yet not for UTF-8. |
A regression has crept in some time after March 12 (sha1 c250e9f) and before current master branch code as of June 2nd.
It is related to accented (column) variable names, specifically when the syntax
dt[ , accented_variable]
is used, i.e.dt[ , Année].
Error message says the
Année
object is not found.The bug does not show up when the alternative syntax
dt[ , "Année", with = FALSE]
is used or with non-accented variable names.Platform is: Windows10, libraries built using Rtools 3.3.0.1959 from source code, encoding is ISO-8859-1.
Edit: Bug shows up under Windows7 too.
The text was updated successfully, but these errors were encountered: