Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation cleanup: [.data.table still mentions key requirement #1488

Closed
MichaelChirico opened this issue Jan 5, 2016 · 0 comments
Closed

Comments

@MichaelChirico
Copy link
Member

The documentation (in ?data.table) still (erroneously) claims that i must be keyed when it is a data.table (missing the on exemption):

When i is a data.table, x must have a key. i is joined to x using x's key and the rows in x that match are returned. An equi-join is performed between each column in i to each column in x's key; i.e., column 1 of i is matched to the 1st column of x's key, column 2 to the second, etc. The match is a binary search in compiled C in O(log n) time. If i has fewer columns than x's key then not all of x's key columns will be joined to (a common use case) and many rows of x will (ordinarily) match to each row of i. If i has more columns than x's key, the columns of i not involved in the join are included in the result. If i also has a key, it is i's key columns that are used to match to x's key columns (column 1 of i's key is joined to column 1 of x's key, column 2 of i's key to column 2 of x's key, and so on for as long as the shorter key) and a binary merge of the two tables is carried out. In all joins the names of the columns are irrelevant; the columns of x's key are joined to in order, either from column 1 onwards of i when i is unkeyed, or from column 1 onwards of i's key. In code, the number of join columns is determined by min(length(key(x)),if (haskey(i)) length(key(i)) else ncol(i)).

This should presumably be updated, perhaps something like:

If i is a data.table, either x must be keyed or the join columns must be specified in on (see on below). In the case that x is keyed and on is not used, [repeat original wording from "i is joined to x..."]

By the way, my instinct says that on overrides keyed joins (i.e., if we specify on, it doesn't matter what the keys of either table are), is that correct? If so perhaps that should be documented as well. Does doing so override the key of x? i?

arunsrinivasan added a commit that referenced this issue Jan 16, 2016
Closes #1488 -- incorporates 'on' argument usage to description of i in [.data.table
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants