Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved key handling when assigning with ':=' #2389

Merged
merged 5 commits into from
Sep 28, 2017

Conversation

MarkusBonsch
Copy link
Contributor

@MarkusBonsch MarkusBonsch commented Sep 27, 2017

This closes #2372.

Now, when updating columns with :=, existing keys and indices are retained up to the first modified column (exclusive).
Example: key on x1, x2, x3 --> assignment on x2 --> key on x1.

The tests 1400 ff had to be modified since they were testing for the old behaviour.
Additional tests have been added to make sure, all cases are handled, also on empty data.tables.

I tested for C memory allocation problems with R -d valgrind and found no issues.
There is an issue with a segmentataion fault on unloading the package but I discovered that this is a problem of the master (issue [#2388])

A small benchmark (code at the end of this post) shows that there is no speed penalty associated with the new implementation:

master PR
0.25 ms 0.27 ms
library(data.table)
library(microbenchmark)

nrow <- 1
ncol <- 20
nindex <- 20
nassigncol <- 10

DT <- data.table(x1 = rnorm(nrow))
setindex(DT, x1)
index <- c("x1")
for(col in seq_len(ncol-1)){
  DT[, paste0("x", col + 1) := rnorm(nrow)]
  if(col < nindex){
    index <- c(index, paste0("x", col+1))
    setindexv(DT, index)
  }
}

test <- microbenchmark(assign = copy(DT)[, paste0("x", (ncol - nassigncol):ncol) := 1], 
                       times = 100, 
                       unit = "ms")

@codecov-io
Copy link

codecov-io commented Sep 27, 2017

Codecov Report

Merging #2389 into master will increase coverage by 0.04%.
The diff coverage is 94.93%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2389      +/-   ##
==========================================
+ Coverage   91.24%   91.29%   +0.04%     
==========================================
  Files          62       62              
  Lines       11978    12023      +45     
==========================================
+ Hits        10929    10976      +47     
+ Misses       1049     1047       -2
Impacted Files Coverage Δ
src/assign.c 94.12% <94.93%> (-0.24%) ⬇️
src/forder.c 94.47% <0%> (+0.78%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f422f5e...5badddf. Read the comment docs.

I added a few more comments in the file at the top explaining the thinking here.
@mattdowle
Copy link
Member

Sorry - I guess I broke it by removing shift alias. I didn't think that was needed as shift is exported. I'll fix.

I've added you as a project member so that you can create branches inside the main repo if you'd like. This will mean I can pull, test and push to your branch, rather than me using the online editor as I'm doing now.

@MarkusBonsch
Copy link
Contributor Author

Thank you very much for the invitation! I am happy to join.

And tidied up that section at the top with hopefully clearer comments.
Currently, we have to skip test numbers ending zero because, for example, if 1419.10 fails, it prints as 1419.1 the same as 1419.1 and we can't distinguish.  Haven't thought of a solution yet.  For now, I just think of it as a bit of culture/personality in the code,  like missing unlucky number 13.
@mattdowle mattdowle added this to the v1.10.6 milestone Sep 28, 2017
@mattdowle mattdowle merged commit 6968a0e into Rdatatable:master Sep 28, 2017
@MarkusBonsch
Copy link
Contributor Author

Thank you for cleaning up. I am still learning :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Request] assigning with := should retain keys better
3 participants