-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
topdown/sets_bench_test: Add intersection
builtin benchmarks.
#5000
topdown/sets_bench_test: Add intersection
builtin benchmarks.
#5000
Conversation
fdfc258
to
5f7f55b
Compare
Local benchmarks are not showing much, if any, improvement over doing set intersection in the naive pairwise way. I suspect this is due to the intersection algorithm guaranteeing equal or smaller sets as iteration progresses, eliminating most of the win potential observed for I'll play around with one or two more ideas in this space before dropping this PR though. 😃 |
5f7f55b
to
fb8cb7a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nitpicks -- sorry for reviewing a draft PR, I couldn't resist :D
topdown/sets.go
Outdated
return nil, err | ||
} | ||
presentInAll = make(map[*ast.Term]struct{}, first.Len()) | ||
first.Iter(func(x *ast.Term) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Iter(func(*ast.Term) error)
-> Foreach(func(*ast.Term))
topdown/sets.go
Outdated
// Add any surviving Terms to the output set. | ||
for k := range presentInAll { | ||
result.Add(k) | ||
} | ||
return result, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 err
? Let's put an if err != nil { ... }
right after the err =
and return result, nil
here
fb8cb7a
to
c43119c
Compare
I've tinkered around with a few different approaches for trying to do set intersection more efficiently, and I can't beat the original implementation, at least with the benchmark that I have (which generates N identical sets, the worst case). The best case (no keys match across all sets), and the average case (a few keys match across all sets) are not benchmarked currently, and those might be worth exploring. A solution that dramatically improves average-case performance with only a minor worst-case penalty might be valuable. |
This commit adds tests for the `intersection` Set builtin, and cleans up the existing tests with a new data generator function. Signed-off-by: Philip Conrad <philipaconrad@gmail.com>
c531d15
to
438e5df
Compare
intersection
builtin benchmarks.
I couldn't get a meaningful speedup, even with relatively pathological input sets-of-sets. I'm throwing in the towel on this for now, and have changed the PR title to reflect a massive cutting-back in scope. This PR now is limited to adding 2x new benchmarks, and cleaning up how data is generated for both the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for wrapping this up. Keeping the benchmarks is a good idea!
This PR is an experiment to try to improve performance for the set
intersection
builtin, inspired by work done in #4980 on the setunion
builtin.The original logic for the builtin did pairwise
Set.Intersection
calls between the input sets, theoretically resulting in some wasted intermediate sets. Ideally, we'd like to minimize wasted allocations, and get a faster and more efficient solution.