-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memrecycle no snprintf overhead #5463
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5463 +/- ##
=======================================
Coverage 99.51% 99.51%
=======================================
Files 80 80
Lines 14771 14774 +3
=======================================
+ Hits 14699 14702 +3
Misses 72 72
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Jan. Simple yet effective approach to rollback the performance issues.
Why do we not need targetDesc in the Lines 867 to 881 in ade6e57
or is created in every possible branch before? |
12 lines above is code which is still valid for the macro. Macro doesn't have own scope, it just substitutes code in existing scope, so the code 12 lines above is still used there. |
src/assign.c
Outdated
@@ -845,6 +851,8 @@ const char *memrecycle(const SEXP target, const SEXP where, const int start, con | |||
warning(_("Coercing 'list' RHS to '%s' to match the type of %s."), type2char(TYPEOF(target)), targetDesc); | |||
source = PROTECT(coerceVector(source, TYPEOF(target))); protecti++; | |||
} else if ((TYPEOF(target)!=TYPEOF(source) || targetIsI64!=sourceIsI64) && !isNewList(target)) { | |||
static char targetDesc[501]; | |||
snprintf(targetDesc, 500, colnum==0 ? _("target vector") : _("column %d named '%s'"), colnum, colname); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this one be declared inside the GetVerbose()>=3
branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think macro below needs it as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the macro only needs it inside its if(COND)
. So isn't this snprintf()
indeed wasteful? This snprintf
could be inside this GetVerbose()>=3
and then appear inside the CHECK_RANGE macro as well inside its if(COND)
. When I didn't see the snprintf
inside the CHECK_RANGE
that's why I wrote #5463 (review). So it would read better if targetDesc
was populated right there in CHECK_RANGE
next to where it is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. I moved it then inside the if
branch.
Hey Jan, IIUC the issue is we are declaring a string buffer inside Is that right? It would be helpful to summarize your fix in the PR description, this will especially make it easier when referring back to the PR in the future. |
src/assign.c
Outdated
@@ -730,6 +728,8 @@ const char *memrecycle(const SEXP target, const SEXP where, const int start, con | |||
for (int i=0; i<slen; ++i) { | |||
const int val = sd[i+soff]; | |||
if ((val<1 && val!=NA_INTEGER) || val>nlevel) { | |||
static char targetDesc[501]; // from 1.14.1 coerceAs reuses memrecycle for a target vector, PR#4491 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe wrap this into a helper function to save some duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be, but I am fine either way
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great that you identified this, all the testing and benchmarks fit with the fix.
However, regarding the thought that it's the allocation?
Is that right?
Yes, your are correct.
No that's wrong. The allocation is declared static
. Please look up what that is. We do that quite a lot in data.table and needs to be understood. To copy that down and repeat conveys to readers of the code that the static
keyword isn't understood. So that needs to be put back at the top just once. The slow down will all be in the snprintf
call itself.
Personally I could even remove this declaration all together by keeping verbose message more obscure.
Yes remove the declaration and snprintf
, agree, but not make the message more obscure. Could repeat the ternary in each of the messages, or wrap that ternary in a macro. With the ternary I think I was trying to match previous messages maybe. Leaving implementation details aside do we agree that the column %d named '%s'
part is better and more useful to provide to user when we can? It would be a shame to lose that part.
Yes, static wasn't understood. As for removing snprintf(targetDesc, 500, colnum==0 ? _("target vector") : _("column %d named '%s'"), colnum, colname);
error(_("Assigning factor numbers to %s. But %d is outside the level range [1,%d]"), targetDesc, val, nlevel); If I started to removing
As the performance regression is fixed, maybe we can leave it as is? declaration of static char TargetDesc has been taken out from memrecycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ This comment is supposed to be attached to the targetDesc
inside the CHECK_RANGE definition. GH won't give me "+" to comment on that line because it's in a block that hasn't changed it seems? ]
How does this targetDesc
get populated now? If I'm right that it doesn't, does that mean we're missing a test or the test is insufficient?
snprintf calls are still there to populate it. |
Btw, the commit message |
I see now how that |
Assuming I'm right this time that that |
I pushed improvement for now to address your comment about |
Thanks Jan. I did the |
GitHub Action + atime test to observe the performance regression introduced by PR Rdatatable#4491 and fixed by PR Rdatatable#5463
closes #5424, closes #5366, closes #5371
code: