-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dcast only computes default fill if necessary #5549
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5549 +/- ##
==========================================
- Coverage 97.49% 97.47% -0.02%
==========================================
Files 80 80
Lines 14861 14873 +12
==========================================
+ Hits 14488 14498 +10
- Misses 373 375 +2 ☔ View full report in Codecov by Sentry. |
src/fcast.c
Outdated
for (int j=0; j<ncols; ++j) { | ||
SET_VECTOR_ELT(ans, nlhs+j+i*ncols, target=allocVector(thistype, nrows) ); | ||
int *itarget = INTEGER(target); | ||
copyMostAttrib(thiscol, target); | ||
for (int k=0; k<nrows; ++k) { | ||
int thisidx = idx[k*ncols + j]; | ||
itarget[k] = (thisidx == NA_INTEGER) ? ithisfill[0] : ithiscol[thisidx-1]; | ||
itarget[k] = (thisidx == NA_INTEGER) ? INTEGER(thisfill)[0] : INTEGER(thiscol)[thisidx-1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just use the same code for INTSXP
, LGLSXP
, REALSXP
and CPLXSXP
with coerceAs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my first time hacking on this code, so I'm not sure, but I also had the feeling that it would be desirable to avoid the repeated logic in these switch cases. About usage of coerceAs, would that introduce unwanted overhead / performance penalty? I was thinking of solving that via a C macro. Anyway I would suggest saving that for another PR, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Value added from coerceAs is handling attributes and therefore classes like int64, not sure if relevant here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It looks good to me.
Co-authored-by: Xianying Tan <shrektan@126.com>
Co-authored-by: Xianying Tan <shrektan@126.com>
From #4586, let's include this as a test case & make sure it doesn't warn: test(xxxxxxx, dcast(data.table(a = 1, b = 2, c = 3), a ~ b, value.var = 'c', fill = '2'),
data.table(a=1, `2`=3, key='a')) |
I added that test case alongside a similar one. |
@tdhock is this PR ready for review? I'm wary of the conflict |
Basically ready to go, some more minor feedback this round. Thanks! |
Co-authored-by: Michael Chirico <michaelchirico4@gmail.com>
Co-authored-by: Michael Chirico <michaelchirico4@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your efforts!
you are welcome! thanks very much for your feedback, too!
|
I think the codecov thing is spurious, merging. |
} | ||
dat = dat[, eval(fun.call), by=c(varnames)] | ||
dat = dat[, maybe_err(eval(fun.call)), by=c(varnames)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could possibly affect the code path in [.
We check for eval
in j
to handle that specially, not sure if we are checking for nested eval
.
I would go with new env
argument, or eventually modify fun.call
by prefixing it with maybe_err
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you thinking for efficiency? Otherwise passing tests ensures your concern is moot right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly for efficiency but just touching the edge like that raises some concerns
some_fill = anyNA(idx) | ||
fill.default = if (run_agg_funs && is.null(fill) && some_fill) dat_for_default_fill[, maybe_err(eval(fun.call))] | ||
if (run_agg_funs && is.null(fill) && some_fill) { | ||
fill.default = dat_for_default_fill[0L][, maybe_err(eval(fun.call))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
Closes #5512
Closes #5390
In current master,
dcast(fill=NULL)
always computes a default fill value, even when there are no missing cells. For example this is the result of a new test case using current masterIn the code above it is normal to compute the fill value (
fill_value=as.integer(min(integer()))
which is NA) because it is used three times.However the code below gives the following result using current master, indicating that a default fill value is computed, even though it is not used:
Using this branch we get the output below (no warning), indicating that no default fill value was computed, because it is not necessary: