Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column subset GForce optimization not applied #4014

Closed
kdkavanagh opened this issue Oct 30, 2019 · 1 comment
Closed

Column subset GForce optimization not applied #4014

kdkavanagh opened this issue Oct 30, 2019 · 1 comment
Labels
duplicate GForce issues relating to optimized grouping calculations (GForce)

Comments

@kdkavanagh
Copy link

According to issue #523, I'd expect subsetting a single column to row .N to be GForce optimized, however it does not appear to be and has a material impact on performance. Using tail to achieve the same result provides a GForce optimization. Note that this only seems to appear with col[.N], not col[1]. Data.table version is 1.12.2

> grps=updates[,list(
+     count=.N,
+     endTime=timestamp[.N]
+     
+ ),by=list(Symbol, grp)]
Detected that j uses these columns: timestamp
Finding groups using forderv ... 1.942s elapsed (14.5s cpu) 
Finding group sizes from the positions (can be avoided to save RAM) ... 0.106s elapsed (0.092s cpu) 
Getting back original order ... 0.770s elapsed (1.823s cpu) 
lapply optimization is on, j unchanged as 'list(.N, timestamp[.N])'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ... 





> grps=updates[,list(
+     count=.N,
+     endTime=tail(timstamp,1)
+     
+ ),by=list(Symbol, grp)]
Detected that j uses these columns: timestamp
Finding groups using forderv ... 1.589s elapsed (11.2s cpu) 
Finding group sizes from the positions (can be avoided to save RAM) ... 0.158s elapsed (0.118s cpu) 
Getting back original order ... 1.491s elapsed (3.390s cpu) 
lapply optimization is on, j unchanged as 'list(.N, tail(timestamp, 1))'
GForce optimized j to 'list(.N, gtail(timestamp, 1))'
Making each group and running j (GForce TRUE) ... 2.532s elapsed (5.011s cpu) 
@ben-schwen
Copy link
Member

Although this issue was first, it seems to be more a less a duplicate of #4809 where a broader discussion took place.

Please feel free to add support on that main issue, or reopen this if I'm in error in assigning the duplicate label

@ben-schwen ben-schwen added duplicate GForce issues relating to optimized grouping calculations (GForce) labels Oct 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate GForce issues relating to optimized grouping calculations (GForce)
Projects
None yet
Development

No branches or pull requests

2 participants