Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

topdown/sets_bench_test: Add intersection builtin benchmarks. #5000

Merged

Conversation

philipaconrad
Copy link
Contributor

@philipaconrad philipaconrad commented Aug 10, 2022

This PR is an experiment to try to improve performance for the set intersection builtin, inspired by work done in #4980 on the set union builtin.

The original logic for the builtin did pairwise Set.Intersection calls between the input sets, theoretically resulting in some wasted intermediate sets. Ideally, we'd like to minimize wasted allocations, and get a faster and more efficient solution.

@philipaconrad philipaconrad self-assigned this Aug 10, 2022
@philipaconrad philipaconrad force-pushed the set-intersection-logic-hoist branch 2 times, most recently from fdfc258 to 5f7f55b Compare August 12, 2022 17:29
@philipaconrad
Copy link
Contributor Author

Local benchmarks are not showing much, if any, improvement over doing set intersection in the naive pairwise way. I suspect this is due to the intersection algorithm guaranteeing equal or smaller sets as iteration progresses, eliminating most of the win potential observed for union.

I'll play around with one or two more ideas in this space before dropping this PR though. 😃

@philipaconrad philipaconrad force-pushed the set-intersection-logic-hoist branch from 5f7f55b to fb8cb7a Compare August 17, 2022 18:06
Copy link
Contributor

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nitpicks -- sorry for reviewing a draft PR, I couldn't resist :D

topdown/sets.go Outdated
return nil, err
}
presentInAll = make(map[*ast.Term]struct{}, first.Len())
first.Iter(func(x *ast.Term) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Iter(func(*ast.Term) error) -> Foreach(func(*ast.Term))

topdown/sets.go Outdated
// Add any surviving Terms to the output set.
for k := range presentInAll {
result.Add(k)
}
return result, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 err? Let's put an if err != nil { ... } right after the err = and return result, nil here

@philipaconrad philipaconrad force-pushed the set-intersection-logic-hoist branch from fb8cb7a to c43119c Compare August 23, 2022 21:29
@philipaconrad
Copy link
Contributor Author

philipaconrad commented Aug 23, 2022

I've tinkered around with a few different approaches for trying to do set intersection more efficiently, and I can't beat the original implementation, at least with the benchmark that I have (which generates N identical sets, the worst case).

The best case (no keys match across all sets), and the average case (a few keys match across all sets) are not benchmarked currently, and those might be worth exploring. A solution that dramatically improves average-case performance with only a minor worst-case penalty might be valuable.

This commit adds tests for the `intersection` Set builtin, and cleans up
the existing tests with a new data generator function.

Signed-off-by: Philip Conrad <philipaconrad@gmail.com>
@philipaconrad philipaconrad force-pushed the set-intersection-logic-hoist branch from c531d15 to 438e5df Compare September 9, 2022 22:23
@philipaconrad philipaconrad changed the title builtins: Speed up set intersections topdown/sets_bench_test: Add intersection builtin benchmarks. Sep 9, 2022
@philipaconrad
Copy link
Contributor Author

I couldn't get a meaningful speedup, even with relatively pathological input sets-of-sets. I'm throwing in the towel on this for now, and have changed the PR title to reflect a massive cutting-back in scope.

This PR now is limited to adding 2x new benchmarks, and cleaning up how data is generated for both the intersection and union benchmarks.

@philipaconrad philipaconrad marked this pull request as ready for review September 9, 2022 22:26
Copy link
Contributor

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for wrapping this up. Keeping the benchmarks is a good idea!

@srenatus srenatus merged commit cb4cf0d into open-policy-agent:main Sep 10, 2022
@philipaconrad philipaconrad deleted the set-intersection-logic-hoist branch September 14, 2022 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants