Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data.table not recognising logical in i #1479

Closed
fabiangehring opened this issue Dec 24, 2015 · 5 comments
Closed

data.table not recognising logical in i #1479

fabiangehring opened this issue Dec 24, 2015 · 5 comments
Assignees
Milestone

Comments

@fabiangehring
Copy link

From here: http://stackoverflow.com/questions/34433063/data-table-not-recognising-logical-in-filter

In the following snippet, data.table does not seem to recognize logicals when used in i.

All my attempts to reproduce the problem in a minimal example failed, that's why I am posting the complete section here (sorry for that). Glad for comments to make the example "minimal".

# Testdata
timetable <- data.table(rbind(
    c("r1", "t1_1", "p1", 10, 10),
    c("r1", "t1_1", "p2", 11, 11),
    c("r1", "t1_1", "p3", 12, 12),
    c("r1", "t1_1", "p4", 13, 13),
    c("r1", "t1_1", "p5", 14, 14),
    c("r1", "t1_1", "p6", 15, 15),
    c("r1", "t1_1", "p7", 16, 16),
    c("r1", "t1_1", "p8", 17, 17),
    c("r1", "t1_1", "p9", 18, 18),
    c("r1", "t1_1", "p10", 19, 19),

    c("r2", "t2", "p11", 9, 9),
    c("r2", "t2", "p12", 10, 10),
    c("r2", "t2", "p3", 11, 11),
    c("r2", "t2", "p13", 12, 12),
    c("r2", "t2", "p14", 13, 13),
    c("r2", "t2", "p15", 14, 14),
    c("r2", "t2", "p16", 15, 15),
    c("r2", "t2", "p17", 16, 16),
    c("r2", "t2", "p18", 17, 17)
  ))
setnames(timetable, c("ROUTE", "TRIP", "STOP", "ARRIVAL", "DEPARTURE"))
timetable[, ':='(ARRIVAL = as.integer(ARRIVAL), DEPARTURE = as.integer(DEPARTURE))]


# Input
startStation <- "p3"
startTime <- 8

setorder(timetable, TRIP, ARRIVAL)
timetable[, ID := .I]

timetable[,':='(ARR_ROUND_PREV = Inf, ARR_ROUND = Inf, ARR_BEST = Inf, MARKED = F, CURRENT_TRIP = F)]
timetable[STOP == startStation, ':='(ARR_ROUND_PREV = startTime, ARR_ROUND = startTime, ARR_BEST = startTime, MARKED = T)]

routes <- timetable[MARKED == T, unique(ROUTE)] 
ids <- timetable[MARKED == T & DEPARTURE > ARR_ROUND, .(ID = ID[DEPARTURE == min(DEPARTURE)]), by = ROUTE][, ID]

timetable[ID %in% ids, CURRENT_TRIP := T]
timetable[, MARKED := F]

trips <- timetable[CURRENT_TRIP == T, unique(TRIP)]
timetable[TRIP %in% trips, CURRENT_TRIP := as.logical(cumsum(CURRENT_TRIP)), by = TRIP]

# ?
timetable
nrow(timetable[CURRENT_TRIP == T]) #8
sum(timetable$CURRENT_TRIP == T) #15

# but 
nrow(timetable[(CURRENT_TRIP == T)]) #15
nrow(timetable[CURRENT_TRIP > 0]) #15
nrow(timetable[CURRENT_TRIP == 1L]) #15

When using the filter CURRENT_TRIP == T not all rows (but only 8 rows) containing the value "TRUE" are recognized. When using alternative solutions, all 15 rows can be identified.

Secondary issue:
Why does adding an additional bracket solve the problem?
"nrow(timetable[(CURRENT_TRIP == T)]) #15"

@polyjian
Copy link

I had the same issue. I am using 1.9.6 of data.table on CRAN. This is a really terrifying bug.

@franknarf1
Copy link
Contributor

Your example is way too baroque for me to guess where the bug starts, but as a workaround...

set2key(timetable, NULL)
timetable[CURRENT_TRIP, .N] # 15

There's also options(datatable.optimize=1) that should disable autoindexing.

@polyjian
Copy link

Do you know why this issue would show up? timetable[CURRENT_TRIP == F, ] would only return partial records, but timetable[CURRENT_TRIP == 0, ] would work (CURRENT_TRIP column, as the example here, is a logical column). Thanks.

@ChristK
Copy link

ChristK commented Jan 2, 2016

This is a minimal example

require(data.table) #v1.9.6
dt <- data.table(a = rep(c(F,F,T,F,F,F,F,F,F), 3),
                 b = c("x", "y", "z"))
set2key(dt, a)
dt[, a := as.logical(sum(a)), by = b]
dt[a == T, .N] # 7 
sum(dt$a == T) # 9

It seems like the secondary key is not updated when the key2 column is altered by a primitive function in j and by is used.

@arunsrinivasan
Copy link
Member

Thanks @ChristK for the nice MRE. Fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants