Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added polynomials benchmark #17695
base: branch-25.02
Are you sure you want to change the base?
Added polynomials benchmark #17695
Changes from 9 commits
866979e
0de2903
cb5ac97
a77ecba
bc1528d
578a56f
349e8d3
d7a7d17
f84cb54
bb2abb9
9ebe20b
8ecf616
548d0a5
bc5fa9e
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we think it's a legitimate benchmark if we hardcode these constants? I think it would be more fair to the other AST/binaryops if we provided an array of device scalar pointers that must be dereferenced for each multiplication.
A different way to think about this: do we want to JIT a new kernel for every possible set of constants and/or every possible order of polynomial? If we compute multiple polynomials, does that JIT overhead pay for itself, or do we need to assume that we will amortize the JIT overhead across multiple polynomials?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The transform API is pretty limited presently, We don't support multiple input columns/scalars in the transform API. As we discussed with Spark-Rapids, we would need to support that to meet their needs, but I don't think it would make much of a performance/throughput difference for this benchmark.
No, it wouldn't be okay to JIT a new kernel for each constant, but it would be reasonable to JIT a new kernel for each polynomial order.
I don't think it would pay for itself across multiple polynomials, Although there's a program cache, I don't have enough insight into its internals, I'll investigate and get back to you.