-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(functions): converting all aggregates to use arrow arrays #532
Conversation
a656db9
to
fa05d78
Compare
@jsternberg Can you make sure that Flux tests exist for each one of the aggregates this fixes? As well as confirm that the spec is up to date? Since we have to touch every transformation the plan is to take the time to make sure they all have tests and are documentation. |
functions/transformations/sum.go
Outdated
func (a *SumIntAgg) DoInt(vs []int64) { | ||
for _, v := range vs { | ||
func (a *SumIntAgg) DoInt(vs *array.Int64) { | ||
for _, v := range vs.Int64Values() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arrow arrays have special methods for computing the sum. Can you use those methods here? Those methods are written using SIMD instructions.
6ecb419
to
11a06a8
Compare
@nathanielc there is at least one test for each of the aggregates that are touched and each of the tests uses all of the possible data types for full test coverage. At the moment, using an incompatible data type fails the entire query rather than a specific table so I skipped invalid data types. In the future, we should test those too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with one small change to Mean to also use the Sum functions.
Also each of the aggregates already have benchmarks written. Can you post the results to the PR before merging so we have a record of the improvement?
for _, v := range vs { | ||
func (a *MeanAgg) DoInt(vs *array.Int64) { | ||
a.count += float64(vs.Len()) | ||
for _, v := range vs.Int64Values() { | ||
//TODO handle overflow | ||
a.sum += float64(v) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also use the math.Sum functions.
11a06a8
to
b323c67
Compare
All aggregates will now use the arrow interface. They do not support nil at the moment.
I might be doing something wrong (or my hardware can't be consistent enough), but I'm not seeing a huge difference.
|
b323c67
to
ba7c765
Compare
I pushed the updated benchmark code and fixed an issue with |
@jsternberg This biggest perf advantage is when we use the special math.* functions. Only the Sum and Mean aggregates do that currently. I do see a strong improvement for Although I do not see benchmarks for BenchmarkSum or BenchmarkMean which do exist. Can you add those benchmarks? Also |
The reason why |
|
The speedup is now there and the PR is approved so I'm going to merge. |
All aggregates will now use the arrow interface. They do not support nil
at the moment.
Fixes #491, #525, #511, #520, and #522.
This also implements part of #513. The other part uses the selector interface which is not included in this change.