Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forder sorts capital letters differently from base::order #1103

Closed
ahardjasa opened this issue Apr 2, 2015 · 4 comments
Closed

forder sorts capital letters differently from base::order #1103

ahardjasa opened this issue Apr 2, 2015 · 4 comments

Comments

@ahardjasa
Copy link

When fast data table ordering is used (e.g. in dt[order(x)] or setorder(dt, x)), uppercase letters are sorted ahead of lowercase letters (A B a b), while base::order sorts with interspersed cases (a A b B). This leads to undesired behaviour e.g. a data table and data frame with identical contents will not be equivalent even if the same command: data[order(data$id), ] is performed.

Example is below, I tried with both 1.9.4 and 1.9.5.

require(data.table)
test <- data.table(id = c("a", "A", "b", "B"), val = 0)
options(datatable.optimize = 1) #use forder
fordered <- test[order(id), verbose = T]
options(datatable.optimize = 0) #use regular order
ordered <- test[order(id), verbose = T]
identical(fordered, ordered) #false
@ahardjasa
Copy link
Author

Note, this doesn't happen in a session with LC_COLLATE=C but it does with LC_COLLATE=en_US.UTF-8 and LC_COLLATE=English_United States.1252

@arunsrinivasan
Copy link
Member

@ahardjasa, we're aware of this. Duplicate of #565.

As written under ?setorder:

Also x[order(.)] is now optimised internally to use data.table's fast order by default. data.table always reorders in C-locale. To sort by session locale, use x[base::order(.)] instead.

The function Scollate that allows us to compare strings under the given locale isn't exposed by R's API. There's no fix we are aware of until this changes. Have a look at this post from Romain Francois for more info.

@ahardjasa
Copy link
Author

Ah, sorry, I didn't notice that. Thanks and apologies.

@arunsrinivasan
Copy link
Member

no worries 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants