-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dplyr::arrange] interfering with data.table's auto-indexing #259
Comments
@romainfrancois This is not really a library(dplyr, warn.conflicts = FALSE)
library(dtplyr)
library(data.table, warn.conflicts = FALSE)
DT <-
fread(
"iso3c country income
MOZ Mozambique LIC
ZMB Zambia LMIC
ALB Albania UMIC
MOZ Mozambique LIC
ZMB Zambia LMIC
ALB Albania UMIC
"
)
codes <- c("ALB", "ZMB")
options(datatable.auto.index = TRUE) # Default
DT <- distinct(DT) %>% as.data.table()
# Index creation because %in% is used for the first time
DT[iso3c %in% codes,verbose=T]
#> Creating new index 'iso3c'
#> Creating index iso3c done in ... forder.c received 3 rows and 3 columns
#> forder took 0 sec
#> 0.048s elapsed (0.048s cpu)
#> Optimized subsetting with index 'iso3c'
#> forder.c received 2 rows and 1 columns
#> forder took 0 sec
#> x is already ordered by these columns, no need to call reorder
#> i.iso3c has same type (character) as x.iso3c. No coercion needed.
#> on= matches existing index, using index
#> Starting bmerge ...
#> bmerge done in 0.000s elapsed (0.000s cpu)
#> Constructing irows for '!byjoin || nqbyjoin' ... 0.000s elapsed (0.000s cpu)
#> Reordering 2 rows after bmerge done in ... forder.c received a vector type 'integer' length 2
#> 0 secs
#> iso3c country income
#> 1: ZMB Zambia LMIC
#> 2: ALB Albania UMIC
# Index mixed up by arrange
DT <- DT %>% arrange(iso3c) %>% as.data.table()
# this is wack because data.table uses the old index where row were rearranged:
DT[iso3c %in% codes,verbose=T]
#> Creating new index 'iso3c'
#> Creating index iso3c done in ... forder.c received 3 rows and 3 columns
#> forder took 0.001 sec
#> 0.045s elapsed (0.045s cpu)
#> Optimized subsetting with index 'iso3c'
#> forder.c received 2 rows and 1 columns
#> forder took 0 sec
#> x is already ordered by these columns, no need to call reorder
#> i.iso3c has same type (character) as x.iso3c. No coercion needed.
#> on= matches existing index, using index
#> Starting bmerge ...
#> bmerge done in 0.000s elapsed (0.000s cpu)
#> Constructing irows for '!byjoin || nqbyjoin' ... 0.000s elapsed (0.000s cpu)
#> iso3c country income
#> 1: ALB Albania UMIC
#> 2: ZMB Zambia LMIC
# this works because (...) prevents the parser to use auto-index
DT[(iso3c %in% codes)]
#> iso3c country income
#> 1: ALB Albania UMIC
#> 2: ZMB Zambia LMIC Created on 2021-07-01 by the reprex package (v2.0.0) |
Closing as this was fixed on the data.table side. (Also as @mgirlich mentioned it was a dplyr issue, not a dtplyr one) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is a follow-up on this StackOverflow question/answer.
The issue is documented in data.table/issues/5042, and this is a cross-reference because
data.table
team suggested there might an issue withdplyr
as well, in the way the indexes are reset.dplyr::arrange
seems to interfere with auto-indexing indata.table
leading to unexpected wrong results.MRE :
The text was updated successfully, but these errors were encountered: