-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-21196] Split codegen info of query plan into sequence #18409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
|
|
||
| /** | ||
| * Find WholeStageCodegenExec subtrees in query plan and do codegen for each of them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outputs the WholeStageCodegenExec subtrees and the codegen in a query plan
| test("debugCodegenStringSeq") { | ||
| val res = codegenStringSeq(spark.range(10).groupBy("id").count().queryExecution.executedPlan) | ||
| assert(res.length == 2) | ||
| assert(res.seq.forall{case (subtree, code) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forall{case -> forall { case
| } | ||
|
|
||
| test("debugCodegenStringSeq") { | ||
| val res = codegenStringSeq(spark.range(10).groupBy("id").count().queryExecution.executedPlan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please post the return value here?
| val (_, source) = s.doCodeGen() | ||
| output += s"${CodeFormatter.format(source)}\n" | ||
| codegenSubtrees.toSeq.map { | ||
| subtree => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please collapse the above two lines into a single line
| // scalastyle:on println | ||
| } | ||
|
|
||
| /** Generate codegen debug info */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the same function description like what you did in codegenStringSeq
| // scalastyle:on println | ||
| } | ||
|
|
||
| /** @return Sequence of WholeStageCodegen subtrees and corresponding codegen */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same here
|
Test build #78541 has finished for PR 18409 at commit
|
|
Example: codegenStringSeq(sql("select 1").queryExecution.executedPlan)The example will return Seq[(String, String)] of length 1, containing the subtree as string and the corresponding generated code. The subtree as string:
The generated code: /* 001 */ public Object generate(Object[] references) {
/* 002 */ return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */ private Object[] references;
/* 007 */ private scala.collection.Iterator[] inputs;
/* 008 */ private scala.collection.Iterator inputadapter_input;
/* 009 */ private UnsafeRow project_result;
/* 010 */ private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder project_holder;
/* 011 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter project_rowWriter;
/* 012 */
/* 013 */ public GeneratedIterator(Object[] references) {
/* 014 */ this.references = references;
/* 015 */ }
/* 016 */
/* 017 */ public void init(int index, scala.collection.Iterator[] inputs) {
/* 018 */ partitionIndex = index;
/* 019 */ this.inputs = inputs;
/* 020 */ inputadapter_input = inputs[0];
/* 021 */ project_result = new UnsafeRow(1);
/* 022 */ project_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(project_result, 0);
/* 023 */ project_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(project_holder, 1);
/* 024 */
/* 025 */ }
/* 026 */
/* 027 */ protected void processNext() throws java.io.IOException {
/* 028 */ while (inputadapter_input.hasNext() && !stopEarly()) {
/* 029 */ InternalRow inputadapter_row = (InternalRow) inputadapter_input.next();
/* 030 */ project_rowWriter.write(0, 1);
/* 031 */ append(project_result);
/* 032 */ if (shouldStop()) return;
/* 033 */ }
/* 034 */ }
/* 035 */
/* 036 */ } |
|
@gatorsmile Thanks, I have revised the code |
|
Test build #78548 has finished for PR 18409 at commit
|
|
Test build #78550 has finished for PR 18409 at commit
|
|
Test build #78551 has finished for PR 18409 at commit
|
|
retest this please |
|
cc @cloud-fan |
|
Test build #78557 has finished for PR 18409 at commit
|
|
retest this please |
|
Test build #78641 has finished for PR 18409 at commit
|
|
LGTM, merging to master! |
codegen info of query plan can be very long.
In debugging console / web page, it would be more readable if the subtrees and corresponding codegen are split into sequence.
Example:
```java
codegenStringSeq(sql("select 1").queryExecution.executedPlan)
```
The example will return Seq[(String, String)] of length 1, containing the subtree as string and the corresponding generated code.
The subtree as string:
> (*Project [1 AS 1#0]
> +- Scan OneRowRelation[]
The generated code:
```java
/* 001 */ public Object generate(Object[] references) {
/* 002 */ return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */ private Object[] references;
/* 007 */ private scala.collection.Iterator[] inputs;
/* 008 */ private scala.collection.Iterator inputadapter_input;
/* 009 */ private UnsafeRow project_result;
/* 010 */ private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder project_holder;
/* 011 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter project_rowWriter;
/* 012 */
/* 013 */ public GeneratedIterator(Object[] references) {
/* 014 */ this.references = references;
/* 015 */ }
/* 016 */
/* 017 */ public void init(int index, scala.collection.Iterator[] inputs) {
/* 018 */ partitionIndex = index;
/* 019 */ this.inputs = inputs;
/* 020 */ inputadapter_input = inputs[0];
/* 021 */ project_result = new UnsafeRow(1);
/* 022 */ project_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(project_result, 0);
/* 023 */ project_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(project_holder, 1);
/* 024 */
/* 025 */ }
/* 026 */
/* 027 */ protected void processNext() throws java.io.IOException {
/* 028 */ while (inputadapter_input.hasNext() && !stopEarly()) {
/* 029 */ InternalRow inputadapter_row = (InternalRow) inputadapter_input.next();
/* 030 */ project_rowWriter.write(0, 1);
/* 031 */ append(project_result);
/* 032 */ if (shouldStop()) return;
/* 033 */ }
/* 034 */ }
/* 035 */
/* 036 */ }
```
## What changes were proposed in this pull request?
add method codegenToSeq: split codegen info of query plan into sequence
## How was this patch tested?
unit test
cloud-fan gatorsmile
Please review http://spark.apache.org/contributing.html before opening a pull request.
Author: Wang Gengliang <ltnwgl@gmail.com>
Closes apache#18409 from gengliangwang/codegen.
codegen info of query plan can be very long.
In debugging console / web page, it would be more readable if the subtrees and corresponding codegen are split into sequence.
Example:
The example will return Seq[(String, String)] of length 1, containing the subtree as string and the corresponding generated code.
The subtree as string:
The generated code:
What changes were proposed in this pull request?
add method codegenToSeq: split codegen info of query plan into sequence
How was this patch tested?
unit test
@cloud-fan @gatorsmile
Please review http://spark.apache.org/contributing.html before opening a pull request.