-
Notifications
You must be signed in to change notification settings - Fork 918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added polynomials benchmark #17695
base: branch-25.02
Are you sure you want to change the base?
Added polynomials benchmark #17695
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Benchmark Resultsast_polynomials_float32[0] NVIDIA RTX A6000
ast_polynomials_float64[0] NVIDIA RTX A6000
binaryop_polynomials_float32[0] NVIDIA RTX A6000
binaryop_polynomials_float64[0] NVIDIA RTX A6000
transform_polynomials_float32[0] NVIDIA RTX A6000
transform_polynomials_float64[0] NVIDIA RTX A6000
order 1order 2order 4order 8order 16order 32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good to me with only a couple suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, except for the UDF part as I am not too familiar with its semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work so far. I have a few considerations. See comments.
std::string expr = std::to_string(constants[0]); | ||
|
||
for (cudf::size_type i = 0; i < order; i++) { | ||
expr = "( " + expr + " ) * x + " + std::to_string(constants[i + 1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we think it's a legitimate benchmark if we hardcode these constants? I think it would be more fair to the other AST/binaryops if we provided an array of device scalar pointers that must be dereferenced for each multiplication.
A different way to think about this: do we want to JIT a new kernel for every possible set of constants and/or every possible order of polynomial? If we compute multiple polynomials, does that JIT overhead pay for itself, or do we need to assume that we will amortize the JIT overhead across multiple polynomials?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The transform API is pretty limited presently, We don't support multiple input columns/scalars in the transform API. As we discussed with Spark-Rapids, we would need to support that to meet their needs, but I don't think it would make much of a performance/throughput difference for this benchmark.
No, it wouldn't be okay to JIT a new kernel for each constant, but it would be reasonable to JIT a new kernel for each polynomial order.
If we compute multiple polynomials, does that JIT overhead pay for itself, or do we need to assume that we will amortize the JIT overhead across multiple polynomials?
I don't think it would pay for itself across multiple polynomials, Although there's a program cache, I don't have enough insight into its internals, I'll investigate and get back to you.
Description
This merge request implements benchmarks for comparing the AST, UDF Transform, and BINARY_OP methods by computing a polynomial.
Closes #17561
Checklist