Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-Forge #2301] Add "merge" or "join" argument to X[Y,...] #614

Closed
arunsrinivasan opened this issue Jun 8, 2014 · 8 comments
Closed

[R-Forge #2301] Add "merge" or "join" argument to X[Y,...] #614

arunsrinivasan opened this issue Jun 8, 2014 · 8 comments

Comments

@arunsrinivasan
Copy link
Member

Submitted by: Matt Dowle; Assigned to: Nobody; R-Forge link

As highlighted here :
http://stackoverflow.com/questions/12773822/why-does-xy-join-of-data-tables-not-allow-a-full-outer-join-or-a-left-join

Not just for syntax but for the inspection of j advantage of X[Y,j,merge=TRUE].

@aryoda
Copy link

aryoda commented Nov 8, 2015

Is a "full outer join" (like this FR asks for) already implemented in another way than this FR asks for?

If not I would like to "vote" for this feature to be implemented.
I could imagine naming the parameter "fullouter" (boolean) which is more intuitive than "merge".
Or stick to the merge parameters (all.x, all.y, all)...

@ethanbsmith
Copy link
Contributor

ethanbsmith commented Mar 29, 2017

I think there is another issue to consider on this topic. Specifically, while we can switch X[Y] to Y[X] in order alter the direction of the outer join, in both cases, only the first data.table can be updated. i.e. X[Y] only allows := style updates to X, while Y[X] only supports updates to Y. So, in this sense, simply reversing the order of X and Y is not always a solution

@franknarf1
Copy link
Contributor

@ebs238 Fwiw, I typically do Y[, v := X[.SD, on=cols, j, by=.EACHI]$V1] for that case.

@ethanbsmith
Copy link
Contributor

ethanbsmith commented Mar 29, 2017

you can also do a merge to get the join right, then subset/aggregate the result of that merge and write it back. However, the assertion (in the SO threads that spawned this) that X[Y] and Y[X] are substitutable to accommodate outer join direction is not accurate. They are substitutable from a join direction perspective, but not from an overall DT functionality perspective. Conceptually, the direction of the outer join, and the choice of which DT to update are independent decisions.

Not suggesting this is a critical issue, as it can be worked around, Just suggesting this aspect be considered as part of this issue

@sritchie73
Copy link
Contributor

If we wanted to implemented left / full joins, perhaps the most "data.table" way would be to add a nomatch.x argument (default = 0L). Then we would have:

X right join Y / X subset on Y (default):
X[Y] # nomatch = NA, nomatch.x = 0L

X inner join Y / X subset on Y, ignoring rows in Y with no match in X:
X[Y, nomatch = 0L] # nomatch.x = 0L

X full join Y:
X[Y, nomatch.x = NA] # nomatch = NA by default

X left join Y:
X[Y, nomatch = 0L, nomatch.x = NA]

@jangorecki

This comment has been minimized.

@MichaelChirico

This comment has been minimized.

@jangorecki
Copy link
Member

jangorecki commented Apr 10, 2020

After reading linked SO in details it seems that requested feature here is an argument that specify join type (inner/full/left/right) using all, all.x, etc., so different types of join can be supported inside [.data.table. Then the issue is basically the same as #3946. The latter one seems to be more nicely defined, will close this one as duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants