You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a join, x[i, v := i.v], if multiple rows of i match to a single row of x, the assignment takes the last one (?). It would be nice to get an error or maybe a warning when this behavior is triggered.
library(data.table)
a <- data.table(id = c(1L, 1L, 2L, 3L, NA_integer_), x = 11:15)
b <- data.table(id = 1:2, y = -(1:2))
b[a, on=.(id), x := i.x, verbose = TRUE]
# Calculated ad hoc index in 0 secs
# Starting bmerge ...done in 0 secs
# Detected that j uses these columns: x,i.x
# Assigning to 3 row subset of 2 rows
I'm not sure if the condition in the title (n > m) is necessary and sufficient for this behavior, though.
My workaround for now would involve looking at the opposite join:
That seems pretty cumbersome. Maybe there's some way for me to capture and grep the verbose output (but then again, maybe not).
Just an idea: A more general approach could involve returning an object containing diagnostics from the join and assignment. Of course, the object cannot be the return value of [.data.table, but maybe it could be dropped in some locked-binding global, .datatable.diagnostic similar to .Last.value. Alternately, maybe that sort of object would fit well into @jangorecki 's dtq package.
I'm thinking along these lines as I write tutorial materials to convert Stata users to R. In Stata, all joins cat a nice-ish table to the console.
Update: Re the verbose message text, the n is recorded thanks to #3460 and the m is just the number of rows in the table (which I guess I didn't realize at the time I posted this, thinking it was instead m = uniqueN(irows, nar.m = TRUE)... which unfortunately is not computed, and there is no way to detect whether the update join was 1:1, etc per the SO link above).
So anyway, I'll leave this open since it seems to highlight a point of difficulty (judging by emoji-votes) even if my suggestion does not fix it.
The text was updated successfully, but these errors were encountered:
jangorecki
changed the title
[Request] Option to error when "Assigning to n row subset of m rows" with n > m
Option to error when "Assigning to n row subset of m rows" with n > m
Apr 17, 2020
jangorecki
added
the
joins
Use label:"non-equi joins" for rolling, overlapping, and non-equi joins
label
Apr 17, 2020
Where does this stands in the priority list ? I really think this would be really useful in non-equijoin, I typically endup doing things like X[Y, on = .(common_id, time < time), next_value := i.value] and this is not working
Foryunatley I keep going back to 3.5.4 Updating in a join of https://franknarf1.github.io/r-tutorial/_book/tables.html
I think the proper way to address this request, when update-on-join is detected, is to:
if mult was missing, switch to mult="last" for backward compatibility
if mult was non-missing, proceed
having mult="error" supported, it will raise error in case of multiple matches. AFAIK we would need to swap x and i for update-on-join (so mult is checked on the proper side of the join), when calling bmerge, that will require to change quite a lot of code.
In a join,
x[i, v := i.v]
, if multiple rows ofi
match to a single row ofx
, the assignment takes the last one (?). It would be nice to get an error or maybe a warning when this behavior is triggered.I'm not sure if the condition in the title (n > m) is necessary and sufficient for this behavior, though.
My workaround for now would involve looking at the opposite join:
That seems pretty cumbersome. Maybe there's some way for me to capture and grep the verbose output (but then again, maybe not).
Just an idea: A more general approach could involve returning an object containing diagnostics from the join and assignment. Of course, the object cannot be the return value of
[.data.table
, but maybe it could be dropped in some locked-binding global,.datatable.diagnostic
similar to.Last.value
. Alternately, maybe that sort of object would fit well into @jangorecki 's dtq package.I'm thinking along these lines as I write tutorial materials to convert Stata users to R. In Stata, all joins
cat
a nice-ish table to the console.SO post from a Stata user interested in uniqueness of matching of each row of
i
inx
etc: https://stackoverflow.com/questions/49541330/r-data-table-merge-vs-stata-mergeUpdate: Re the verbose message text, the n is recorded thanks to #3460 and the m is just the number of rows in the table (which I guess I didn't realize at the time I posted this, thinking it was instead
m = uniqueN(irows, nar.m = TRUE)
... which unfortunately is not computed, and there is no way to detect whether the update join was 1:1, etc per the SO link above).So anyway, I'll leave this open since it seems to highlight a point of difficulty (judging by emoji-votes) even if my suggestion does not fix it.
The text was updated successfully, but these errors were encountered: