test and confirm new parallel subset performance #3175

jangorecki · 2018-12-01T04:48:17Z

Matt commented :

Lines 27 to 30 in 1847500

    
           // For small n such as 2,3,4 etc we hope OpenMP will be sensible inside it and not create a team with each thread doing just one item. Otherwise, 
        
           // call overhead would be too high for highly iterated calls on very small subests. TODO: test and confirm 
        
           // Futher, we desire (currently at least) to stress-test the threaded code (especially in latest R-devel) on small data to reduce chance that bugs 
        
           // arise only over a threshold of n.

jangorecki · 2019-01-24T08:00:02Z

Following script tests subset by integer row ids. It also measures the timing of !anyNA branch. For testing openmp overhead it should be enough.

vim dt-parallel-subset.R

args = as.integer(commandArgs(TRUE))
th = args[1L]
N = args[2L]
K = 100L

get_i = function(n.out, n.in) {
  n.out = as.integer(n.out)
  n.in = as.integer(n.in)
  set.seed(n.out)
  sample(n.in, n.out)
}

library(data.table)
cat(sprintf("# datagen %s rows\n", N))
set.seed(108)
DT = data.table(
  id1 = sample(sprintf("id%03d",1:K), N, TRUE),      # large groups (char)
  id2 = sample(sprintf("id%03d",1:K), N, TRUE),      # large groups (char)
  id3 = sample(sprintf("id%010d",1:(N/K)), N, TRUE), # small groups (char)
  id4 = sample(K, N, TRUE),                          # large groups (int)
  id5 = sample(K, N, TRUE),                          # large groups (int)
  id6 = sample(N/K, N, TRUE),                        # small groups (int)
  v1 =  sample(5, N, TRUE),                          # int in range [1,5]
  v2 =  sample(5, N, TRUE),                          # int in range [1,5]
  v3 =  sample(round(runif(100,max=100),4), N, TRUE) # numeric e.g. 23.5749
)

cat(sprintf("# setDTthreads(%s)\n", th))
setDTthreads(th)

cat("# 0 row (first `[`` call overhead):\n")
system.time(ans<-DT[0L])

cat("# 1 row:\n")
i = get_i(1L, nrow(DT))
system.time(ans<-DT[i])

cat("# 2 rows:\n")
i = get_i(2L, nrow(DT))
system.time(ans<-DT[i])

cat("# 5 rows:\n")
i = get_i(5L, nrow(DT))
system.time(ans<-DT[i])

cat("# 10% of rows:\n")
i = get_i(nrow(DT)*0.1, nrow(DT))
system.time(ans<-DT[i])

q("no")

Rscript dt-parallel-subset.R 1 1e6

timings coming soon

jangorecki · 2019-01-24T08:06:39Z

1th 1e7

> Rscript dt-parallel-subset.R 1 1e7
# datagen 10000000 rows
# setDTthreads(1)
# 0 row (first `[`` call overhead):
   user  system elapsed 
  0.005   0.000   0.005 
# 1 row:
   user  system elapsed 
      0       0       0 
# 2 rows:
   user  system elapsed 
  0.000   0.000   0.001 
# 5 rows:
   user  system elapsed 
  0.000   0.000   0.001 
# 10% of rows:
   user  system elapsed 
  0.153   0.012   0.165

20th 1e7

> Rscript dt-parallel-subset.R 20 1e7
# datagen 10000000 rows
# setDTthreads(20)
# 0 row (first `[`` call overhead):
   user  system elapsed 
  0.033   0.000   0.007 
# 1 row:
   user  system elapsed 
      0       0       0 
# 2 rows:
   user  system elapsed 
  0.001   0.000   0.000 
# 5 rows:
   user  system elapsed 
      0       0       0 
# 10% of rows:
   user  system elapsed 
  0.440   0.039   0.103

1th 1e8

> Rscript dt-parallel-subset.R 1 1e8
# datagen 100000000 rows
# setDTthreads(1)
# 0 row (first `[`` call overhead):
   user  system elapsed 
  0.006   0.000   0.005 
# 1 row:
   user  system elapsed 
  0.001   0.000   0.000 
# 2 rows:
   user  system elapsed 
      0       0       0 
# 5 rows:
   user  system elapsed 
  0.001   0.000   0.000 
# 10% of rows:
   user  system elapsed 
  2.393   0.132   2.524

20th 1e8

> Rscript dt-parallel-subset.R 20 1e8
# datagen 100000000 rows
# setDTthreads(20)
# 0 row (first `[`` call overhead):
   user  system elapsed 
  0.054   0.004   0.010 
# 1 row:
   user  system elapsed 
  0.000   0.000   0.001 
# 2 rows:
   user  system elapsed 
  0.001   0.000   0.000 
# 5 rows:
   user  system elapsed 
  0.000   0.000   0.001 
# 10% of rows:
   user  system elapsed 
  4.218   0.284   1.265

1th 1e9

> Rscript dt-parallel-subset.R 1 1e9
# datagen 1000000000 rows
# setDTthreads(1)
# 0 row (first `[`` call overhead):
   user  system elapsed
  0.005   0.000   0.006
# 1 row:
   user  system elapsed
  0.001   0.000   0.000
# 2 rows:
   user  system elapsed 
  0.000   0.000   0.001 
# 5 rows:
   user  system elapsed 
      0       0       0 
# 10% of rows:
   user  system elapsed 
 33.478   1.460  34.938

20th 1e9

> Rscript dt-parallel-subset.R 20 1e9
# datagen 1000000000 rows
# setDTthreads(20)
# 0 row (first `[`` call overhead):
   user  system elapsed
  0.057   0.000   0.009
# 1 row:
   user  system elapsed
  0.001   0.000   0.001
# 2 rows:
   user  system elapsed
      0       0       0
# 5 rows:
   user  system elapsed
      0       0       0
# 10% of rows:
   user  system elapsed 
 58.295   2.454  20.285

jangorecki · 2019-01-24T08:32:22Z

During the timings above I observed that team of threads was started even for 1, 2, 5 rows. Still it did not result in noticeable overhead. All subsets of 1, 2, 5 rows were 0.000-0.001.

jangorecki · 2019-01-29T11:19:09Z

Above checks were using single subset operation. I encounter some noticeable difference when I loop over subset operation.

library(data.table)
m = matrix(1L, nrow=1e8, ncol=10)
DT = as.data.table(m)
setDTthreads(20)
system.time(for (i in 1:1000) DT[i,])
#   user  system elapsed 
#  4.210   0.000   0.229 
setDTthreads(1)
system.time(for (i in 1:1000) DT[i,])
#   user  system elapsed 
#  0.107   0.007   0.114

@mattdowle does it quality for reopen?

mattdowle · 2020-06-18T06:30:10Z

PR #4484 closes this one.

v1.12.8 to confirm Jan's result:

> m = matrix(1L, nrow=1e8, ncol=10)
> DT = as.data.table(m)
> setDTthreads(0)
> system.time(for (i in 1:1000) DT[i,])
   user  system elapsed 
  1.512   0.000   0.143
> setDTthreads(1)
> system.time(for (i in 1:1000) DT[i,])
   user  system elapsed 
  0.083   0.000   0.083

With #4484 :

> setDTthreads(0)
> system.time(for (i in 1:1000) DT[i,])
   user  system elapsed 
  0.071   0.000   0.071 
> setDTthreads(1)
> system.time(for (i in 1:1000) DT[i,])
   user  system elapsed 
  0.072   0.000   0.072

jangorecki added a commit that referenced this issue Jan 24, 2019

update comment on par subset perf check, closes #3175

47a7670

jangorecki mentioned this issue Jan 24, 2019

update comment on par subset perf check #3311

Merged

jangorecki self-assigned this Jan 24, 2019

mattdowle added this to the 1.12.2 milestone Jan 26, 2019

mattdowle closed this as completed in #3311 Jan 26, 2019

mattdowle reopened this Jan 29, 2019

jangorecki removed their assignment Feb 4, 2019

mattdowle modified the milestones: 1.12.2, 1.12.4 Mar 1, 2019

jangorecki mentioned this issue Jul 31, 2019

Selecting from data.table by row is very slow #3735

Open

jangorecki modified the milestones: 1.12.4, 1.13.0 Sep 17, 2019

mattdowle modified the milestones: 1.12.7, 1.12.9 Dec 8, 2019

nhirschey mentioned this issue Mar 9, 2020

Calculations with many groups are much slower by default than with setDTthreads(1) #4294

Open

jangorecki added openmp performance labels Apr 5, 2020

mattdowle modified the milestones: 1.12.11, 1.12.9 Jun 18, 2020

jangorecki mentioned this issue Jun 18, 2020

throttle threads for iterated small data tasks #4484

Merged

mattdowle closed this as completed in #4484 Jun 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test and confirm new parallel subset performance #3175

test and confirm new parallel subset performance #3175

jangorecki commented Dec 1, 2018 •

edited by mattdowle

Loading

jangorecki commented Jan 24, 2019 •

edited

Loading

jangorecki commented Jan 24, 2019 •

edited

Loading

jangorecki commented Jan 24, 2019

jangorecki commented Jan 29, 2019

mattdowle commented Jun 18, 2020 •

edited

Loading

test and confirm new parallel subset performance #3175

test and confirm new parallel subset performance #3175

Comments

jangorecki commented Dec 1, 2018 • edited by mattdowle Loading

jangorecki commented Jan 24, 2019 • edited Loading

jangorecki commented Jan 24, 2019 • edited Loading

1th 1e7

20th 1e7

1th 1e8

20th 1e8

1th 1e9

20th 1e9

jangorecki commented Jan 24, 2019

jangorecki commented Jan 29, 2019

mattdowle commented Jun 18, 2020 • edited Loading

jangorecki commented Dec 1, 2018 •

edited by mattdowle

Loading

jangorecki commented Jan 24, 2019 •

edited

Loading

jangorecki commented Jan 24, 2019 •

edited

Loading

mattdowle commented Jun 18, 2020 •

edited

Loading