-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data.table-style coalesce #3424
Comments
You'll need to go back to GPL-2 for this. |
Otherwise happy of course. For the optimization, you'll need to go down to C to do better. I've got 128 GB and I've never had |
I used |
This would be an excellent addition to data.table. Do you think it would be possible to support
It would take a vector (or list) of patterns, find matches, then group the matches by the remaining portion of the name. It would coalesce each group of matches in the order the patterns were provided. I think this would be very convenient functionality, but not quite sure how it would work in the j-slot. I also may be overlooking a simple way to do this using the proposed version of coalesce. |
@smingerson these kind of interface can be easily and reliably handled on user side. your_pattern_fun = function() c("second_a","third_a")
jj = c(as.name("coalesce"), as.name("first_a"), lapply(your_pattern_fun(), as.name))
jj = as.call(jj)
print(jj)
#coalesce(first_a, second_a, third_a)
dt[, "first_a" := eval(jj)][]
# first_a second_a third_a first_b
# 1: -0.0004156106 -0.0004156106 10 NA
# 2: 1.0000000000 -1.7100788510 10 NA
# 3: 1.0000000000 -0.7910541140 10 NA
# 4: 1.0000000000 -0.1093094194 10 NA
# 5: 1.0000000000 -1.1020017136 10 NA
# 6: 1.0196378767 1.0196378767 10 NA
# 7: 0.9189197761 0.9189197761 10 NA
# 8: 1.0000000000 1.7176207637 10 NA
# 9: 1.0000000000 0.7649291191 10 NA
#10: 10.0000000000 NA 10 NA I generally agree it would be useful to provide alternative interface to pass cc(F)
set.seed(702)
dt <- data.table(first_a = sample(c(1, NA), size = 10, replace = TRUE),
second_a = c(rnorm(9), NA),
third_a = 10,
first_b = NA_real_)
your_pattern_fun = function() c("second_a","third_a")
dt[, "first_a" := coalesce(first_a, .dots=.SD), .SDcols=your_pattern_fun()][] and because you can use |
Please note https://www.gnu.org/licenses/gpl-faq.en.html#IfInterpreterIsGPL which contains "The interpreted program, to the interpreter, is just data.". See also my comments about that GPL FAQ at the top of PR #2456 which changed data.table's license from GPL to MPL. |
Did a bunch of
coalesce
ing today and was sorely missing an efficient version.@HughParsonage, would you be happy to add
hutils::coalesce
todata.table
? I have the discussion in #2677 in mind...I made tinkered with
hutils::coalesce
to come up with:main difference being to skip running
anyNA
every iteration and instead focus on "whittling down" theis.na(x)
vectorBenchmarked against
hmisc::coalesce
and it's hit or miss... maybe need more replications (function evaluation takes at most around 2 seconds so this is doable)? Or there's some extra optimization I'm missing...Anyway, the same logic would probably be faster in C...
The text was updated successfully, but these errors were encountered: