Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to error when "Assigning to n row subset of m rows" with n > m #2022

Open
franknarf1 opened this issue Feb 13, 2017 · 2 comments
Open
Labels
joins Use label:"non-equi joins" for rolling, overlapping, and non-equi joins

Comments

@franknarf1
Copy link
Contributor

franknarf1 commented Feb 13, 2017

In a join, x[i, v := i.v], if multiple rows of i match to a single row of x, the assignment takes the last one (?). It would be nice to get an error or maybe a warning when this behavior is triggered.

library(data.table)
a <- data.table(id = c(1L, 1L, 2L, 3L, NA_integer_), x = 11:15)
b <- data.table(id = 1:2, y = -(1:2))
b[a, on=.(id), x := i.x, verbose = TRUE]
# Calculated ad hoc index in 0 secs
# Starting bmerge ...done in 0 secs
# Detected that j uses these columns: x,i.x 
# Assigning to 3 row subset of 2 rows

I'm not sure if the condition in the title (n > m) is necessary and sufficient for this behavior, though.

My workaround for now would involve looking at the opposite join:

a[b, on=.(id), .N, by=.EACHI][, range(N)]
# [1] 1 2

That seems pretty cumbersome. Maybe there's some way for me to capture and grep the verbose output (but then again, maybe not).


Just an idea: A more general approach could involve returning an object containing diagnostics from the join and assignment. Of course, the object cannot be the return value of [.data.table, but maybe it could be dropped in some locked-binding global, .datatable.diagnostic similar to .Last.value. Alternately, maybe that sort of object would fit well into @jangorecki 's dtq package.

I'm thinking along these lines as I write tutorial materials to convert Stata users to R. In Stata, all joins cat a nice-ish table to the console.

SO post from a Stata user interested in uniqueness of matching of each row of i in x etc: https://stackoverflow.com/questions/49541330/r-data-table-merge-vs-stata-merge


Update: Re the verbose message text, the n is recorded thanks to #3460 and the m is just the number of rows in the table (which I guess I didn't realize at the time I posted this, thinking it was instead m = uniqueN(irows, nar.m = TRUE)... which unfortunately is not computed, and there is no way to detect whether the update join was 1:1, etc per the SO link above).

So anyway, I'll leave this open since it seems to highlight a point of difficulty (judging by emoji-votes) even if my suggestion does not fix it.

@jangorecki jangorecki self-assigned this Apr 17, 2020
@jangorecki jangorecki changed the title [Request] Option to error when "Assigning to n row subset of m rows" with n > m Option to error when "Assigning to n row subset of m rows" with n > m Apr 17, 2020
@jangorecki jangorecki added the joins Use label:"non-equi joins" for rolling, overlapping, and non-equi joins label Apr 17, 2020
@statquant
Copy link

Where does this stands in the priority list ? I really think this would be really useful in non-equijoin, I typically endup doing things like
X[Y, on = .(common_id, time < time), next_value := i.value] and this is not working
Foryunatley I keep going back to 3.5.4 Updating in a join of https://franknarf1.github.io/r-tutorial/_book/tables.html

@jangorecki
Copy link
Member

jangorecki commented May 6, 2020

I think the proper way to address this request, when update-on-join is detected, is to:

  • if mult was missing, switch to mult="last" for backward compatibility
  • if mult was non-missing, proceed

having mult="error" supported, it will raise error in case of multiple matches. AFAIK we would need to swap x and i for update-on-join (so mult is checked on the proper side of the join), when calling bmerge, that will require to change quite a lot of code.

@jangorecki jangorecki removed their assignment Dec 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
joins Use label:"non-equi joins" for rolling, overlapping, and non-equi joins
Projects
None yet
Development

No branches or pull requests

3 participants