-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cbindlist, mergelist #4370
Open
jangorecki
wants to merge
121
commits into
master
Choose a base branch
from
cbind-merge-list
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+2,203
−41
Open
cbindlist, mergelist #4370
Changes from 101 commits
Commits
Show all changes
121 commits
Select commit
Hold shift + click to select a range
3e72c8d
cbindlist
jangorecki a915832
add cbind by reference, timing
jangorecki 05dd562
R prototype of mergelist
jangorecki cba5bc1
wording
jangorecki 1edf4d3
use lower overhead funs
jangorecki 36bbd25
stick to int32 for now, correct R_alloc
jangorecki 7d51dd6
bmerge C refactor for codecov and one loop for speed
jangorecki 0437da5
address revealed codecov gaps
jangorecki e287213
refactor vecseq for codecov
jangorecki 5dc07bd
seqexp helper, some alloccol export on C
jangorecki a4d124e
bmerge codecov, types handled in R bmerge already
jangorecki 40d3bfe
better comment seqexp
jangorecki beffe39
bmerge mult=error #655
jangorecki 4e211a1
multiple new C utils
jangorecki fbddcd6
swap if branches
jangorecki 01b2f9d
explain new C utils
jangorecki c8e070b
comments mostly
jangorecki 3004748
reduce conflicts to PR #4386
jangorecki cf73fcf
comment C code
jangorecki b64c0c3
address multiple matches during update-on-join #3747
jangorecki 348d5b7
Revert "address multiple matches during update-on-join #3747"
jangorecki df0c11a
merge.dt has temporarily mult arg, for testing
jangorecki 5793508
minor changes to cbindlist c
jangorecki 6017eac
dev mergelist, for single pair now
jangorecki f88e0de
add quiet option to cc()
jangorecki 2387f09
mergelist tests
jangorecki 5ae7d4d
add check for names to perhaps.dt
jangorecki d0b2af8
rm mult from merge.dt method
jangorecki 7e51189
rework, clean, polish multer, fix righ and full joins
jangorecki ea77bce
make full join symmetric
jangorecki 06a1ae8
mergepair inner function to loop on
jangorecki a942940
extra check for symmetric
jangorecki dc5f263
mergelist manual
jangorecki bc17057
ensure no df-dt passed where list expected
jangorecki db30e44
comments and manual
jangorecki 0dd82c3
handle 0 cols tables
jangorecki 9fe7f55
more tests
jangorecki 113f688
more tests and debugging
jangorecki 9bcb814
move more logic closer to bmerge, simplify mergepair
jangorecki a7f39c9
more tests
jangorecki b1f39a6
revert not used changes
jangorecki 29bd438
reduce not needed checks, cleanup
jangorecki ca0d76a
copy arg behavior, manual, no tests yet
jangorecki 9ac7a89
cbindlist manual, export both
jangorecki 384396b
cleanup processing bmerge to dtmatch
jangorecki 11974f0
test function match order for easier preview
jangorecki de48d2d
vecseq gets short-circuit
jangorecki 66e7d53
batch test allow browser
jangorecki 25d0633
big cleanup
jangorecki fee063b
remmove unneeded stuff, reduce diff
jangorecki 84d7146
more cleanup, minor manual fixes
jangorecki d78d136
add proper test scripts
jangorecki 2b1795b
Merge branch 'master' into cbind-merge-list
jangorecki dabb55c
comment out not used code for coverage
jangorecki 3ca7d4a
more tests, some nocopy opts
jangorecki e4b14e6
rename sql test script, should fix codecov
jangorecki b1fce17
simplify dtmatch inner branch
jangorecki 50f9e89
more precise copy, now copy only T or F
jangorecki d43be04
unused arg not yet in api, wording
jangorecki 4580dd4
comments and refer issues
jangorecki 5d0e991
codecov
jangorecki 03aa427
hasindex coverage
jangorecki b15ab93
codecov gap
jangorecki 17d2fa8
tests for join using key, cols argument
jangorecki 492d3b5
fix missing import forderv
jangorecki a5c4a26
more tests, improve missing on handling
jangorecki 426e187
more tests for order of inner and full join for long keys
jangorecki c8ded9c
new allow.cartesian option, #4383, #914
jangorecki 674bff8
reduce diff, improve codecov
jangorecki 0a483c2
reduce diff, comments
jangorecki db3249a
need more DT, not lists, mergelist 3+ tbls
jangorecki a573286
proper escape heavy check
jangorecki 78123b0
unit tests
jangorecki 9273212
more tests, address overalloc failure
jangorecki c5df010
mergelist and cbindlist retain index
jangorecki 64d4f5e
manual, examples
jangorecki 211da09
fix manual
jangorecki 4275e8c
minor clarify in manual
jangorecki 102de68
retain keys, right outer join for snowflake schema joins
jangorecki 4487360
duplicates in cbindlist
jangorecki 2923719
recycling in cbindlist
jangorecki 658410b
escape 0 input in copyCols
jangorecki b708507
empty input handling
jangorecki 1b9f913
closing cbindlist
jangorecki d70cd41
vectorized _on_ and _join.many_ arg
jangorecki a5179b7
rename dtmatch to dtmerge
jangorecki c86c9ad
vectorized args: how, mult
jangorecki b89b6f8
full join, reduce overhead for mult=error
jangorecki 249e09a
mult default value dynamic
jangorecki 1fa9b40
fix manual
jangorecki f889b0e
add "see details" to Rd
MichaelChirico f35a555
mention shared on in arg description
MichaelChirico 52d4f9f
amend feedback from Michael
jangorecki 2884b29
semi and anti joins will not reorder x columns
jangorecki 3f3f9de
Merge branch 'master' into cbind-merge-list
jangorecki 6df88bc
spelling, thx to @jan-glx
jangorecki 060610b
check all new funs used and add comments
jangorecki db58b6c
bugfix, sort=T needed for now
jangorecki 53b9b0d
Merge branch 'master' into cbind-merge-list
MichaelChirico c6add42
Update NEWS.md
MichaelChirico ec1973f
Merge branch 'master' into cbind-merge-list
MichaelChirico d26265f
Merge branch 'master' into cbind-merge-list
MichaelChirico 115b1eb
NEWS placement
MichaelChirico e2ae4d0
numbering
MichaelChirico 9a1d7db
ascArg->order
MichaelChirico 3ead046
Merge remote-tracking branch 'origin/cbind-merge-list' into cbind-mer…
MichaelChirico d579af4
attempt to restore from master
MichaelChirico 9a51230
Update to stopf() error style
MichaelChirico 1b363ad
Need isFrame for now
MichaelChirico e9387d2
More quality checks: any(!x)->!all(x); use vapply_1{b,c,i}
MichaelChirico b30437b
really restore from master
MichaelChirico 6b9aa6c
try to PROTECT() before duplicate()
MichaelChirico 71bb8b1
update error message in test
MichaelChirico 40191d7
appease the rchk gods
MichaelChirico 3758316
extraneous space
MichaelChirico e4e5d8c
missing ';'
MichaelChirico 338711a
use catf
MichaelChirico 008abef
simplify perhapsDataTableR
MichaelChirico 854d35e
move sqlite.Rraw.manual into other.Rraw
MichaelChirico c975c14
simplify for loop
MichaelChirico 5952dd8
Merge remote-tracking branch 'origin/cbind-merge-list' into cbind-mer…
MichaelChirico File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -206,7 +206,7 @@ replace_dot_alias = function(e) { | |
} | ||
return(x) | ||
} | ||
if (!mult %chin% c("first","last","all")) stopf("mult argument can only be 'first', 'last' or 'all'") | ||
if (!mult %chin% c("first","last","all","error")) stop("mult argument can only be 'first', 'last', 'all' or 'error'") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can this be moved to an antecedent PR as well? |
||
missingroll = missing(roll) | ||
if (length(roll)!=1L || is.na(roll)) stopf("roll must be a single TRUE, FALSE, positive/negative integer/double including +Inf and -Inf or 'nearest'") | ||
if (is.character(roll)) { | ||
|
@@ -505,6 +505,7 @@ replace_dot_alias = function(e) { | |
} | ||
i = .shallow(i, retain.key = TRUE) | ||
ans = bmerge(i, x, leftcols, rightcols, roll, rollends, nomatch, mult, ops, verbose=verbose) | ||
if (mult=="error") mult="all" | ||
xo = ans$xo ## to make it available for further use. | ||
# temp fix for issue spotted by Jan, test #1653.1. TODO: avoid this | ||
# 'setorder', as there's another 'setorder' in generating 'irows' below... | ||
|
@@ -526,13 +527,22 @@ replace_dot_alias = function(e) { | |
if (!byjoin || nqbyjoin) { | ||
# Really, `anyDuplicated` in base is AWESOME! | ||
# allow.cartesian shouldn't error if a) not-join, b) 'i' has no duplicates | ||
if (verbose) {last.started.at=proc.time();catf("Constructing irows for '!byjoin || nqbyjoin' ... ");flush.console()} | ||
irows = if (allLen1) f__ else vecseq(f__,len__, | ||
if (allow.cartesian || | ||
notjoin || # #698. When notjoin=TRUE, ignore allow.cartesian. Rows in answer will never be > nrow(x). | ||
!anyDuplicated(f__, incomparables = c(0L, NA_integer_))) { | ||
NULL # #742. If 'i' has no duplicates, ignore | ||
} else as.double(nrow(x)+nrow(i))) # rows in i might not match to x so old max(nrow(x),nrow(i)) wasn't enough. But this limit now only applies when there are duplicates present so the reason now for nrow(x)+nrow(i) is just to nail it down and be bigger than max(nrow(x),nrow(i)). | ||
if (verbose) {last.started.at=proc.time();cat("Constructing irows for '!byjoin || nqbyjoin' ... ");flush.console()} | ||
irows = if (allLen1) f__ else { | ||
join.many = getOption("datatable.join.many") # #914, default TRUE for backward compatibility | ||
anyDups = if (!join.many && length(f__)==1L && len__==nrow(x)) { | ||
NULL # special case of scalar i match to const duplicated x, not handled by anyDuplicate: data.table(x=c(1L,1L))[data.table(x=1L), on="x"] | ||
} else if (!notjoin && ( # #698. When notjoin=TRUE, ignore allow.cartesian. Rows in answer will never be > nrow(x). | ||
!allow.cartesian || | ||
!join.many)) | ||
as.logical(anyDuplicated(f__, incomparables = c(0L, NA_integer_))) | ||
limit = if (!is.null(anyDups) && anyDups) { # #742. If 'i' has no duplicates, ignore | ||
if (!join.many) stop("Joining resulted in many-to-many join. Perform quality check on your data, use mult!='all', or set 'datatable.join.many' option to TRUE to allow rows explosion.") | ||
else if (!allow.cartesian && !notjoin) as.double(nrow(x)+nrow(i)) | ||
else stop("internal error: checking allow.cartesian and join.many, unexpected else branch reached, please report to issue tracker") # nocov | ||
} | ||
vecseq(f__, len__, limit) | ||
} # rows in i might not match to x so old max(nrow(x),nrow(i)) wasn't enough. But this limit now only applies when there are duplicates present so the reason now for nrow(x)+nrow(i) is just to nail it down and be bigger than max(nrow(x),nrow(i)). | ||
if (verbose) {cat(timetaken(last.started.at),"\n"); flush.console()} | ||
# Fix for #1092 and #1074 | ||
# TODO: implement better version of "any"/"all"/"which" to avoid | ||
|
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self-citation here?
also please address.