-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23407][SQL] add a config to try to inline all mutable states during codegen #20599
Conversation
do we have any test about the performance regression introduced by the change? I mean, can we quantify it? |
.doc("When adding mutable states during code generation, whether or not we should try to " + | ||
"inline all the states. If this config is false, we only try to inline primitive stats, " + | ||
"so that primitive states are more likely to be inlined. Set this config to true to make " + | ||
"the behavior same as Spark 2.2.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it only behaves the same before we hit the threshold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, let me improve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also watch out for a typo s/stats/states/
Test build #87391 has finished for PR 20599 at commit
|
@@ -1461,20 +1465,19 @@ object CodeGenerator extends Logging { | |||
CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.update(classBytes.length) | |||
try { | |||
val cf = new ClassFile(new ByteArrayInputStream(classBytes)) | |||
val stats = cf.methodInfos.asScala.flatMap { method => | |||
cf.methodInfos.asScala.flatMap { method => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these changes related to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not, but a small clean up.
This PR will be hold until 2.3 is released. |
@mgaido91 As I said in the PR description, no regression is found so far, just providing a config to be super safe. Actually this PR has a problem: the codegen usually happens at executor side, so we can't use SQLConf directy. I'll figure this out after my vacation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that in some cases codegen happens on the executors, so we can't use SQLConf directly.
In the case of whole-stage codegen, codegen happens on the driver side so that part is okay...
.doc("When adding mutable states during code generation, whether or not we should try to " + | ||
"inline all the states. If this config is false, we only try to inline primitive stats, " + | ||
"so that primitive states are more likely to be inlined. Set this config to true to make " + | ||
"the behavior same as Spark 2.2.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also watch out for a typo s/stats/states/
retest this please |
Test build #87793 has finished for PR 20599 at commit
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
This is a followup of #19811 .
In #19811, we picked a sub-optimal solution that always compact non-primitive mutable states to arrays, to make primitive mutable states more likely to get inlined.
This PR introduces a new config to not treat primitive states specially and try to inline all states, to avoid any potential perf regression in Spark 2.3. By default it's false.
In the future, we can remove this config, and dynamically decide which states to inline. For example, we can use placeholders during codegen, and analysis all the mutable states at the end and replace the placeholders.
Note that there are no known regression cases, so this is not a blocker for Spark 2.3
How was this patch tested?
a new test.