You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
data.table sorts in C locale always. This is now more clearly documented by PR #2387.
There are two reasons :
consistency so that results are not affected by the environment that R is started in; e.g. servers and services
speed because the locale-aware C library calls are slower than data.table's ascii-sort (i.e. C-locale)
Even if we found a way to allow this option efficiently, let's say a key was set on column cn in your example. We would have to ensure that the option of where USA sorted to is maintained inside the key because binary search would need to know which option was used to create the key. It might be possible that some keys in some tables had been created with the option set, and other keys in other tables created later or loaded from disk without the option set, and this could lead to bugs. One main reason for data.table's speed is sorting and that theme runs through the whole code base. To allow a locale-sort option would be too risky for a low benefit.
See PR #2387 for several new sentences in the documentation.
Submitted by: Edgaras Dunajevas; Assigned to: Nobody; R-Forge link
data.table
sorts strings in theC-locale
which is different from base which usesEnglish_United States.1252 locale
. Here is reproducible example.As reported here on SO.
The text was updated successfully, but these errors were encountered: