-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comment #20419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ping @rednaxelafx |
| s"""Codegend pipeline for stage (id=$codegenStageId) | ||
| |${this.treeString.trim}""".stripMargin)} | ||
| final class $className extends ${classOf[BufferedRowIterator].getName} { | ||
| ${ctx.registerComment(s"codegenStageId=$codegenStageId", true)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of putting this comment inside the class, how about let it on top of it? Like:
/* 002 */ return new GeneratedIteratorForCodegenStage1(references);
/* 003 */ }
/* 004 */
/* 005 */ // codegenStageId=1
/* 006 */ final class GeneratedIteratorForCodegenStage1 extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 007 */ private Object[] references;
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, done
|
Test build #86745 has finished for PR 20419 at commit
|
|
LGTM, and +1 on @viirya 's idea. I like it better for the comment to be on top of the class declaration instead of inside it; but I'm okay either way if others have strong opinion otherwise. As long as the comment line is right next to the class declaration line, that's good enough for easy pattern matching. BTW, to save a few characters from the generated code, maybe we don't need to generate this comment when the |
|
I prefer to leave the comment regardless of the |
|
@kiszk SGTM and LGTM. Let's ship it! One more question on the side: with the Could you please post an example of either the generated code or the metrics, e.g. a |
|
Test build #86774 has finished for PR 20419 at commit
|
| * Register a comment and return the corresponding place holder | ||
| */ | ||
| def registerComment(text: => String): String = { | ||
| def registerComment(text: => String, forceComment: Boolean = false): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: rename it to force?
Also adding the text?
@param force whether to force registering the comments
|
@rednaxelafx Does the following codes address your concern? |
|
@gatorsmile Not directly. The |
|
Then, we should add a test case to ensure it will not be broken. |
|
Test build #86882 has finished for PR 20419 at commit
|
|
I did the experiment that I asked about at #20419 (comment) , and verified that under the current implementation, this PR will not affect the codegen cache hit behavior. But the way it's working is a bit brittle. The /*wholestagecodegen_c*/
final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {Notice the The way it's used right now, because from But let's assume if someone doesn't know about this, and then accidentally added code in I'd say functional wise this PR is good to go; maybe we want to add a comment somewhere in |
|
I gave it a bit more thought. Here's an alternative proposal: instead of using a "force comment" mechanism in the current form (which still gets a In this case, instead of using @kiszk WDYT? |
|
@rednaxelafx I understand your concern when If we call I am sorry that I will slowly response next few days due to a short vacation. |
|
@kiszk For this specific kind of usage, I don't think using a hardcoded stable ID will be a problem. For safety, we can add a runtime check. Let's assume this new method is called
Have fun on your vacation! |
|
Test build #87076 has finished for PR 20419 at commit
|
|
Retest this please |
|
Test build #87084 has finished for PR 20419 at commit
|
|
ping @rednaxelafx |
1 similar comment
|
ping @rednaxelafx |
rednaxelafx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor nit. Thanks @kiszk !
Sorry for the delay of reviewing this PR. I got caught up in some other projects that fully consumed my bandwidth.
| /** | ||
| * Register a comment and return the corresponding place holder | ||
| * | ||
| * @param placeholderId a string for a place holder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can we rephrase this ScalaDoc a bit, maybe like:
/**
* ...
* @param placeholderId an optionally specified identifier for the comment's placeholder. The caller should make sure this identifier is unique within the compilation unit. If this argument is not specified, a fresh identifier will be automatically created and used as the placeholder.
* ...
*/| if (SparkEnv.get != null && SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) { | ||
| val name = freshName("c") | ||
| if (force || | ||
| SparkEnv.get != null && SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated, but should this be a SQL conf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, the question is whether we can use SQLConf.get in this case. If not, we might need to keep it like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the answer seems to be No. See this discussion.
|
@kiszk This is unable to merge to 2.3. Could you open a new JIRA for this? |
|
I see. I will open a new JIRA tomorrow. |
|
Test build #87405 has finished for PR 20419 at commit
|
What changes were proposed in this pull request?
This PR always adds
codegenStageIdin comment of the generated class.How was this patch tested?
Existing tests