Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gshift as gforce optimized shift #5205

Merged
merged 39 commits into from
Oct 20, 2021
Merged

gshift as gforce optimized shift #5205

merged 39 commits into from
Oct 20, 2021

Conversation

ben-schwen
Copy link
Member

@ben-schwen ben-schwen commented Oct 9, 2021

Closes #1534

  • R (optimization "dispatch")
  • C lag/lead
  • NSE evaluation for fill arguments e.g. fill=x[.N]
  • C cyclic
  • tests
  • benchmarks
  • NEWS

Internal optimization

Currently shift is only GForce optimized when its fill argument is not a call, to avoid cases where fill=x[1], fill=x[.N], etc. since those would need evaluation within their group.

General Remark

The gforce optimization of shift would be the first function to always return vectors of the same length as by. The first step in this direction was made in #5089 by allowing ghead to return multiple values per group. Before that every gforce function returned exactly 1 value per group.

The whole datatable.optimize optimization code branch has grown pretty fat and might need to be rewritten. This is also related to #1414 #5032 #3815 #735 #523.

@@ -1162,3 +1167,83 @@ SEXP gprod(SEXP x, SEXP narmArg) {
return(ans);
}

SEXP gshift(SEXP x, SEXP nArg, SEXP fillArg, SEXP typeArg) {
Copy link
Member

@MichaelChirico MichaelChirico Oct 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we prefer to re-use the code in shift.c here (to the extent possible)? otherwise it'll be a pain to always update code in both places going forward.

It might require some refactoring/modularizing of the original shift.c.

Copy link
Member Author

@ben-schwen ben-schwen Oct 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try to recycle as much as possible but there are some different things to consider about both implementations.

Basic shift gshift
works on data.tables/vectors/lists operates on vectors
main workhouse is memmove used memory may not be blocked
no subset on i concurrent subsetting on irows

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe basic shift could be redirected to call gshift then? It would have to call gforce prep function first, perhaps in a dummy way to trick gshift into working. Then it would be common code and all existing tests of shift would cover the new gforce gshift too.
Perhaps not now. Happy to merge this PR as is, subject to integer64. A task could be created to redirect shift to gshift in future.

@codecov
Copy link

codecov bot commented Oct 10, 2021

Codecov Report

Merging #5205 (5f3a10a) into master (fa76197) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 5f3a10a differs from pull request most recent head 5eec8f2. Consider uploading reports for the commit 5eec8f2 to get more accurate results
Impacted file tree graph

@@           Coverage Diff           @@
##           master    #5205   +/-   ##
=======================================
  Coverage   99.50%   99.50%           
=======================================
  Files          77       77           
  Lines       14536    14590   +54     
=======================================
+ Hits        14464    14518   +54     
  Misses         72       72           
Impacted Files Coverage Δ
src/init.c 100.00% <ø> (ø)
R/data.table.R 99.89% <100.00%> (+<0.01%) ⬆️
src/gsumm.c 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa76197...5eec8f2. Read the comment docs.

@ben-schwen
Copy link
Member Author

ben-schwen commented Oct 14, 2021

Benchmarks

library(data.table)
library(microbenchmark)
basic_shift = shift
bench = function(DT) print(microbenchmark(
  DT[, shift(x, 1, type="lag"), y],
  DT[, basic_shift(x, 1, type="lag"), y],
  DT[, c(NA, head(x,-1)), y],
  times = 10L, unit = "s"), signif=4)
N = 1e6
DT = data.table(x = sample(N), y = sample(1e2,N,TRUE))
bench(DT)
# Unit: seconds
#                                      expr     min     lq    mean  median      uq     max neval
#        DT[, shift(x, 1, type = "lag"), y] 0.01342 0.0146 0.02848 0.02031 0.05030 0.05487    10
#  DT[, basic_shift(x, 1, type = "lag"), y] 0.01129 0.0116 0.01416 0.01383 0.01560 0.01855    10
#               DT[, c(NA, head(x, -1)), y] 0.01205 0.0169 0.01766 0.01830 0.01901 0.02111    10

DT = data.table(x = sample(N), y = sample(1e3,N,TRUE))
# Unit: seconds
#                                      expr     min      lq    mean  median      uq     max neval
#        DT[, shift(x, 1, type = "lag"), y] 0.01092 0.01152 0.01242 0.01207 0.01339 0.01448    10
#  DT[, basic_shift(x, 1, type = "lag"), y] 0.02676 0.02710 0.02873 0.02796 0.03022 0.03277    10
#               DT[, c(NA, head(x, -1)), y] 0.02064 0.02085 0.02186 0.02115 0.02273 0.02466    10

DT = data.table(x = sample(N), y = sample(1e4,N,TRUE))
bench(DT)
# Unit: seconds
#                                      expr     min      lq   mean  median      uq     max neval
#        DT[, shift(x, 1, type = "lag"), y] 0.01328 0.01653 0.0183 0.01722 0.02066 0.02501    10
#  DT[, basic_shift(x, 1, type = "lag"), y] 0.22480 0.23000 0.2617 0.23990 0.28300 0.37830    10
#               DT[, c(NA, head(x, -1)), y] 0.10440 0.11080 0.1234 0.11710 0.13030 0.15360    10

DT = data.table(x = sample(N), y = sample(1e5,N,TRUE))
bench(DT)
# Unit: seconds
#                                      expr     min      lq    mean  median      uq    max neval
#        DT[, shift(x, 1, type = "lag"), y] 0.02344 0.02761 0.03975 0.03406 0.04527 0.0726    10
#  DT[, basic_shift(x, 1, type = "lag"), y] 1.99000 2.13600 2.17000 2.17700 2.18500 2.3810    10
#               DT[, c(NA, head(x, -1)), y] 0.89570 0.92040 0.93580 0.94220 0.95250 0.9613    10
N = 1e7
DT = data.table(x = sample(N), y = sample(1e2,N,TRUE))
bench(DT)
# Unit: seconds
#                                      expr    min    lq   mean median     uq     max neval
#        DT[, shift(x, 1, type = "lag"), y] 0.2044 0.244 0.2705 0.2527 0.2710  0.4261    10
#  DT[, basic_shift(x, 1, type = "lag"), y] 0.1231 0.140 0.1793 0.1447 0.2030  0.3918    10
#               DT[, c(NA, head(x, -1)), y] 0.1618 0.172 0.2093 0.1931 0.2416  0.2932    10

DT = data.table(x = sample(N), y = sample(1e3,N,TRUE))
# Unit: seconds
#                                      expr    min     lq   mean median     uq    max neval
#        DT[, shift(x, 1, type = "lag"), y] 0.1984 0.2206 0.2307 0.2243 0.2464 0.2722    10
#  DT[, basic_shift(x, 1, type = "lag"), y] 0.1278 0.1481 0.1746 0.1684 0.1943 0.2404    10
#               DT[, c(NA, head(x, -1)), y] 0.1460 0.1605 0.1850 0.1727 0.2154 0.2414    10

DT = data.table(x = sample(N), y = sample(1e4,N,TRUE))
bench(DT)
# Unit: seconds
#                                      expr    min     lq   mean median     uq     max neval
#        DT[, shift(x, 1, type = "lag"), y] 0.2686 0.2795 0.2972 0.2984 0.3148  0.3322    10
#  DT[, basic_shift(x, 1, type = "lag"), y] 0.4850 0.4913 0.5132 0.5072 0.5424  0.5528    10
#               DT[, c(NA, head(x, -1)), y] 0.3696 0.3830 0.4146 0.4233 0.4440  0.4521    10

DT = data.table(x = sample(N), y = sample(1e5,N,TRUE))
bench(DT)
# Unit: seconds
#                                      expr    min     lq   mean median     uq     max neval
#        DT[, shift(x, 1, type = "lag"), y] 0.3696 0.3902 0.4027 0.3939 0.4288  0.4543    10
#  DT[, basic_shift(x, 1, type = "lag"), y] 2.5130 2.5400 2.5810 2.5630 2.6230  2.7160    10
#               DT[, c(NA, head(x, -1)), y] 1.2890 1.3280 1.3650 1.3700 1.4100  1.4230    10

@ben-schwen ben-schwen marked this pull request as ready for review October 14, 2021 17:53
… call (unusual) is distracting; even fill (a call or not) wasn't mentioned in the issue. The main thing is that shift is optimized which is awesome
switch(TYPEOF(x)) {
case LGLSXP: { int *ansd=LOGICAL(tmp); SHIFT(int, LOGICAL, ansd[ansi++]=val); } break;
case INTSXP: { int *ansd=INTEGER(tmp); SHIFT(int, INTEGER, ansd[ansi++]=val); } break;
case REALSXP: { double *ansd=REAL(tmp); SHIFT(double, REAL, ansd[ansi++]=val); } break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was just about to merge (looks awesome!) and then just saw integer64 missed here (and test_bit64 added to tests). Bug fix 45 starts shift(xInt64, fill=0) so now that shift is optimized I wonder if the shift problems of int64 would return. Maybe existing tests of shift int64 don't test it by group then.
Can't think of anything else.

Copy link
Member Author

@ben-schwen ben-schwen Oct 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added integer64 support. Interestingly we don't do anything special about integer64 anymore in shift.c or gshift except using coerceAs for coercing and copyMostAttrib for copying class integer64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. The coerceAs is for fill but fill isn't currently tested by the new gforce test 2224. A loop of differently typed fill needs adding to the test then, and I suspect a specific case here for integer64 will then be needed to pass that? Can I leave that to you, and the C99 warning, hence WIP again.

@ben-schwen
Copy link
Member Author

A sample(1e6, N, TRUE) (where N is 1e7 as you have) would be good for the news item; i.e. 1 million groups of 10 rows. You only went to group size 100 so far. The example on S.O. has just 5 rows in each group after all. You can just cut to the chase and test group size 5. Or in the news item just use the test verbatim as reported on S.O. and link to it. That way nobody can say we made up an unrealistic test if we used the test reported from the wild.

Added now both, but either one of them should do.

inst/tests/tests.Rraw Outdated Show resolved Hide resolved
src/gsumm.c Outdated
SET_VECTOR_ELT(ans, g, tmp=allocVector(TYPEOF(x), nx));
#define SHIFT(CTYPE, RTYPE, ASSIGN) { \
const CTYPE *xd = (const CTYPE *)RTYPE(x); \
const CTYPE fill = (const CTYPE)RTYPE(thisfill)[0]; \
Copy link
Member

@mattdowle mattdowle Oct 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locally I'm seeing this warning (from gcc with -std=c99 I think but it might be due to gcc 10.3.0) :

gsumm.c: In function ‘gshift’:
gsumm.c:1207:26: warning: ISO C forbids casting nonscalar to the same type [-Wpedantic]
 1207 |       const CTYPE fill = (const CTYPE)RTYPE(thisfill)[0];                                                         \
      |                          ^
gsumm.c:1241:59: note: in expansion of macro ‘SHIFT’
 1241 |       case CPLXSXP: { Rcomplex *ansd=COMPLEX(tmp);        SHIFT(Rcomplex, COMPLEX,  ansd[ansi++]=val); } break;
      |                                                           ^~~~~

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIC it should also work without the cast.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the casting of fill between NA_REAL and NA_INTEGER64 though?

@mattdowle mattdowle added the WIP label Oct 15, 2021
@ben-schwen
Copy link
Member Author

I extended the tests that we introduced in #5189 to also check the coercion between types for gshift. There we also test the casting of fill between NA_REAL and NA_INTEGER64 and it works.

In my current understanding it works since under the hood integer64 is just a "shifted" REALSXP with a class attribute. If we directly write these values into the answer then no problem should occur and the output of numeric and integer64 is depending whether the answer has the class label of integer64.

library(bit64)
.Internal(inspect(lim.integer64()))
#> @55a458cea988 14 REALSXP g0c2 [OBJ,ATT] (len=2, tl=0) -4.94066e-324,nan
#> ATTRIB:
#>   @55a457f3c630 02 LISTSXP g0c0 [REF(1)] 
#>     TAG: @55a455431140 01 SYMSXP g1c0 [MARK,REF(34387),LCK,gp=0x6000] "class" (has value)
#>     @55a457c6f040 16 STRSXP g0c1 [REF(65535)] (len=1, tl=0)
#>       @55a458186ff8 09 CHARSXP g1c2 [MARK,REF(1114),gp=0x61] [ASCII] [cached] "integer64"

@ben-schwen ben-schwen removed the WIP label Oct 16, 2021
@ben-schwen ben-schwen requested a review from mattdowle October 16, 2021 19:02
@mattdowle
Copy link
Member

Thanks, I see now. Tweaked the comment there, and all good.

@mattdowle mattdowle merged commit e88826e into master Oct 20, 2021
@mattdowle mattdowle deleted the gforce_shift branch October 20, 2021 19:03
@jangorecki
Copy link
Member

jangorecki commented Nov 29, 2022

I suspect this is causing hard to spot issue: #5547

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

shift() in data.table v1.9.6 is slow for many groups
4 participants