Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23407][SQL] add a config to try to inline all mutable states during codegen #20599

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

This is a followup of #19811 .

In #19811, we picked a sub-optimal solution that always compact non-primitive mutable states to arrays, to make primitive mutable states more likely to get inlined.

This PR introduces a new config to not treat primitive states specially and try to inline all states, to avoid any potential perf regression in Spark 2.3. By default it's false.

In the future, we can remove this config, and dynamically decide which states to inline. For example, we can use placeholders during codegen, and analysis all the mutable states at the end and replace the placeholders.

Note that there are no known regression cases, so this is not a blocker for Spark 2.3

How was this patch tested?

a new test.

@cloud-fan
Copy link
Contributor Author

@mgaido91
Copy link
Contributor

mgaido91 commented Feb 13, 2018

do we have any test about the performance regression introduced by the change? I mean, can we quantify it?

.doc("When adding mutable states during code generation, whether or not we should try to " +
"inline all the states. If this config is false, we only try to inline primitive stats, " +
"so that primitive states are more likely to be inlined. Set this config to true to make " +
"the behavior same as Spark 2.2.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it only behaves the same before we hit the threshold?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, let me improve it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also watch out for a typo s/stats/states/

@SparkQA
Copy link

SparkQA commented Feb 13, 2018

Test build #87391 has finished for PR 20599 at commit 013c02f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1461,20 +1465,19 @@ object CodeGenerator extends Logging {
CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.update(classBytes.length)
try {
val cf = new ClassFile(new ByteArrayInputStream(classBytes))
val stats = cf.methodInfos.asScala.flatMap { method =>
cf.methodInfos.asScala.flatMap { method =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes related to this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not, but a small clean up.

@gatorsmile
Copy link
Member

This PR will be hold until 2.3 is released.

@cloud-fan
Copy link
Contributor Author

@mgaido91 As I said in the PR description, no regression is found so far, just providing a config to be super safe.

Actually this PR has a problem: the codegen usually happens at executor side, so we can't use SQLConf directy. I'll figure this out after my vacation.

Copy link
Contributor

@rednaxelafx rednaxelafx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that in some cases codegen happens on the executors, so we can't use SQLConf directly.
In the case of whole-stage codegen, codegen happens on the driver side so that part is okay...

.doc("When adding mutable states during code generation, whether or not we should try to " +
"inline all the states. If this config is false, we only try to inline primitive stats, " +
"so that primitive states are more likely to be inlined. Set this config to true to make " +
"the behavior same as Spark 2.2.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also watch out for a typo s/stats/states/

@kiszk
Copy link
Member

kiszk commented Feb 28, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Feb 28, 2018

Test build #87793 has finished for PR 20599 at commit 013c02f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jan 13, 2020
@cloud-fan cloud-fan closed this Jan 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants