-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fast_strptime POSIX default incompatible with dplyr #403
Comments
POSIXlt is for compatibility with strptime and is faster because of now internal conversion. You can pass If any of the above are "inconvenient", you can overwrite the function with your own in your .Rprofile. I am quite reluctant to add a global option for these high performance functions. |
Thanks very much for your reply!
Dr. Sven Krackow |
Good question. I think you should bring it to @hadley's attention in a separate dplyr thread. |
I strongly believe that no function should generate POSIXlt. |
stptime and fast_strptime do it for efficiency reasons. It avoids the overhead of double conversion (to POSIXct internally, and to POSIXlt by the final consumer). The consumer might want to use only some components of POSIXlt (year, month etc). In any case, whatever is the source of POSIXlt, I think dplyr should handle it gracefully. |
I intended to open a new issue but I see this very recent discussion in relation with POSIXlt and POSIXct ant changement in behaviour between For example library(lubridate)
x <- dmy("21-02-2010")
class(x)
class(update(x, hour = 20))
hour(x) <- 20
class(x)
library(lubridate)
x <- dmy("21-02-2010")
class(x)
#> [1] "POSIXct" "POSIXt"
class(update(x, hour = 20))
#> [1] "POSIXct" "POSIXt"
hour(x) <- 20
class(x)
#> [1] "POSIXct" "POSIXt"
library(lubridate)
x <- dmy("21-02-2010")
class(x)
#> [1] "Date"
class(update(x, hour = 20))
#> [1] "POSIXlt" "POSIXt"
hour(x) <- 20
class(x)
#> [1] "POSIXlt" "POSIXt" Because of this change, a use case to create datetime from 2 column of a data.frame is not working anymore with dplyr.
library(dplyr)
library(lubridate) # 1.5.6
DF <- data_frame(date = c("21-10-2012", "10-07-2015"), hour = c("5","20"))
DF %>%
mutate(date = dmy(date)) %>%
mutate(dtime = update(date, hour = hour))
#> Error: `mutate` does not support `POSIXlt` results
library(dplyr)
library(lubridate)
DF <- data_frame(date = c("21-10-2012", "10-07-2015"), hour = c("5","20"))
DF %>%
mutate(date = dmy(date)) %>%
mutate(dtime = update(date, hour = hour))
#> Source: local data frame [2 x 3]
#>
#> date hour dtime
#> (time) (chr) (time)
#> 1 2012-10-21 5 2012-10-21 05:00:00
#> 2 2015-07-10 20 2015-07-10 20:00:00 If I change all my code, I could find the old behaviour with library(dplyr)
library(lubridate)
DF <- data_frame(date = c("21-10-2012", "10-07-2015"), hour = c("5","20"))
DF %>%
mutate(date = dmy(date, tz = "UTC")) %>%
mutate(dtime = update(date, hour = hour))
#> Source: local data frame [2 x 3]
#>
#> date hour dtime
#> (time) (chr) (time)
#> 1 2012-10-21 5 2012-10-21 05:00:00
#> 2 2015-07-10 20 2015-07-10 20:00:00 I do not know if it should be considered as a problem or not. This issues between POSIXct, POSIXlt and Date sure break some previous code and cause some incompatibility between lubridate and dplyr - at least some checks to be sure to use POSIXct and not POSIXlt. |
I believe very strongly that the POSIXlt data structure is an internal implementation detail, and should not be exposed to the user. The data structure (a list of length 9 pretending to be a vector of length n) is fragile, and breaks in a wide variety of cases. Adding dplyr support for POSIXlt is not on the cards. I would rather we look towards a hybrid data structure, i.e. something that is POSIXct at the core, but uses attributes to store parsed date time components. Then lubridate could use those attributes if available, or could otherwise compute from scratch. |
Cannot you simply convert it to POSIXct on assignment?
Yet another data structure for date-times? Sound like a complication. Lubridate should produce |
I'd rather people never produce POSIXlt in the first place, rather than trying to patch things up after the fact. |
update.Date <- function(object, ...){
lt <- as.POSIXlt(object, tz = "UTC")
new <- update(lt, ...)
if (sum(c(new$hour, new$min, new$sec), na.rm = TRUE)) {
new
} else {
as.Date(new)
}
}
update.Date <- function(object, ...){
lt <- as.POSIXlt(object, tz = "UTC")
new <- update(lt, ...)
if (sum(c(new$hour, new$min, new$sec), na.rm = TRUE)) {
as.POSIXct(new)
} else {
as.Date(new)
}
}
I wonder it too know but it is a convenient way to create datetime from date and times columns. Other ways is to use |
I have changed the update on Date to return POSIXct. Surprisingly, that didn't break any tests. As already said, changing default output of |
BTW, you can still use |
I just want to add that the change in behavior for lubridate (>= 1.5.0), in the DESCRIPTION, some user only update specific packages (why? I have no idea, it's so easy to just update all....), and R doesn't automatically update versions of dependent packages as far as I can tell. I basically need to check what version of lubridate the user has within my package's function, then call fast_strptime with or without the argument. Also, for what it's worth, that is still vastly superior than using base R! |
You can use Sorry for the trouble. This lt/ct change (here and in |
Users applying lubridate as well as dplyr ran into troubles after fast_strptime default output class was change from POSIXct to POSIXlt, as highly useful functions like left_join or bind_rows do no longer work, as dplyr cannot handle POSIXlt. A global option to change fast_strptime's lt option would be highly appreciated! Thanks, SVEN.
The text was updated successfully, but these errors were encountered: