Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbindlist gives wrong results when it is applied to difftime objects with different units. #4541

Closed
lgong-rms opened this issue Jun 13, 2020 · 2 comments · Fixed by #6309
Closed

Comments

@lgong-rms
Copy link

lgong-rms commented Jun 13, 2020

# Minimal reproducible example

library(data.table)

difftime_diffunits <- list(
  structure(list(
    start_time = "2020-06-02 05:12:14.652", end_time = "2020-06-02 06:01:24.753", 
    diff_time = structure(49.1683500011762, class = "difftime", units = "mins")), 
    row.names = c(NA, -1L), class = c("data.table", "data.frame")), 
  structure(list(
    start_time = "2020-06-02 05:12:14.656", end_time = "2020-06-02 06:02:25.045", 
    diff_time = structure(50.1731500029564, class = "difftime", units = "mins")), 
    row.names = c(NA, -1L), class = c("data.table", "data.frame")), 
  structure(list(
    start_time = "2020-06-02 05:12:14.973", end_time = "2020-06-02 06:32:28.081", 
    diff_time = structure(1.33697444445557, class = "difftime", units = "hours")), 
    row.names = c(NA, -1L), class = c("data.table", "data.frame")), 
  structure(list(
    start_time = "2020-06-02 05:12:14.973", end_time = "2020-06-02 06:24:51.367", 
    diff_time = structure(1.21010944445928, class = "difftime", units = "hours")), 
    row.names = c(NA, -1L), class = c("data.table", "data.frame")))

# [[1]]
# start_time                end_time     diff_time
# 1: 2020-06-02 05:12:14.652 2020-06-02 06:01:24.753 49.16835 mins
# 
# [[2]]
# start_time                end_time     diff_time
# 1: 2020-06-02 05:12:14.656 2020-06-02 06:02:25.045 50.17315 mins
# 
# [[3]]
# start_time                end_time      diff_time
# 1: 2020-06-02 05:12:14.973 2020-06-02 06:32:28.081 1.336974 hours
# 
# [[4]]
# start_time                end_time      diff_time
# 1: 2020-06-02 05:12:14.973 2020-06-02 06:24:51.367 1.210109 hours

rbindlist(difftime_diffunits)

# start_time                end_time      diff_time
# 1: 2020-06-02 05:12:14.652 2020-06-02 06:01:24.753 49.168350 mins
# 2: 2020-06-02 05:12:14.656 2020-06-02 06:02:25.045 50.173150 mins
# 3: 2020-06-02 05:12:14.973 2020-06-02 06:32:28.081  1.336974 mins
# 4: 2020-06-02 05:12:14.973 2020-06-02 06:24:51.367  1.210109 mins

Note that the units of diff_time of the 3rd and 4th rows should be "hours" instead of "mins".

# Output of sessionInfo()

R version 4.0.0 (2020-04-24)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.12.9

loaded via a namespace (and not attached):
[1] compiler_4.0.0 tools_4.0.0    tinytex_0.23   xfun_0.14     
@jangorecki
Copy link
Member

jangorecki commented Jun 13, 2020

Thank you for reporting. The problem is quite tricky because attributes of diff time are defining actual value. We could make a special handling of such class, but it is not really addressing the problem because other classes that depends on attributes will still suffers from that. For example a decimal class where precision is stored in attribute.
I think it is better to just document that instead.

@CharnelMouse
Copy link

CharnelMouse commented Jul 31, 2020

Simpler example using by:

library(data.table)

x <- data.table(a = c(1L, 1L, 2L, 2L),
                b = as.POSIXct(c("2020-07-03 12:00:00", "2020-07-03 12:00:01",
                                 "2020-07-03 12:00:00", "2020-08-05 12:00:00")))

x

#    a                   b
# 1: 1 2020-07-03 12:00:00
# 2: 1 2020-07-03 12:00:01
# 3: 2 2020-07-03 12:00:00
# 4: 2 2020-08-05 12:00:00

x[, .(b = difftime(b[2], b[1], units = "auto"), by = a]

#    a       b
# 1: 1  1 secs
# 2: 2 33 secs

x[, .(b = difftime(b[2], b[1], units = "secs")), by = a]

#    a            b
# 1: 1       1 secs
# 2: 2 2851200 secs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants