[SPARK-21196] Split codegen info of query plan into sequence #18409

gengliangwang · 2017-06-23T18:41:33Z

codegen info of query plan can be very long.
In debugging console / web page, it would be more readable if the subtrees and corresponding codegen are split into sequence.

Example:

codegenStringSeq(sql("select 1").queryExecution.executedPlan)

The example will return Seq[(String, String)] of length 1, containing the subtree as string and the corresponding generated code.

The subtree as string:

(*Project [1 AS 1#0]
+- Scan OneRowRelation[]

The generated code:

/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   private Object[] references;
/* 007 */   private scala.collection.Iterator[] inputs;
/* 008 */   private scala.collection.Iterator inputadapter_input;
/* 009 */   private UnsafeRow project_result;
/* 010 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder project_holder;
/* 011 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter project_rowWriter;
/* 012 */
/* 013 */   public GeneratedIterator(Object[] references) {
/* 014 */     this.references = references;
/* 015 */   }
/* 016 */
/* 017 */   public void init(int index, scala.collection.Iterator[] inputs) {
/* 018 */     partitionIndex = index;
/* 019 */     this.inputs = inputs;
/* 020 */     inputadapter_input = inputs[0];
/* 021 */     project_result = new UnsafeRow(1);
/* 022 */     project_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(project_result, 0);
/* 023 */     project_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(project_holder, 1);
/* 024 */
/* 025 */   }
/* 026 */
/* 027 */   protected void processNext() throws java.io.IOException {
/* 028 */     while (inputadapter_input.hasNext() && !stopEarly()) {
/* 029 */       InternalRow inputadapter_row = (InternalRow) inputadapter_input.next();
/* 030 */       project_rowWriter.write(0, 1);
/* 031 */       append(project_result);
/* 032 */       if (shouldStop()) return;
/* 033 */     }
/* 034 */   }
/* 035 */
/* 036 */ }

What changes were proposed in this pull request?

add method codegenToSeq: split codegen info of query plan into sequence

How was this patch tested?

unit test

@cloud-fan @gatorsmile
Please review http://spark.apache.org/contributing.html before opening a pull request.

gatorsmile · 2017-06-23T18:55:49Z

sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala

+  }
+
+  /**
+   * Find WholeStageCodegenExec subtrees in query plan and do codegen for each of them


Outputs the WholeStageCodegenExec subtrees and the codegen in a query plan

gatorsmile · 2017-06-23T19:01:05Z

sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala

+  test("debugCodegenStringSeq") {
+    val res = codegenStringSeq(spark.range(10).groupBy("id").count().queryExecution.executedPlan)
+    assert(res.length == 2)
+    assert(res.seq.forall{case (subtree, code) =>


forall{case -> forall { case

gatorsmile · 2017-06-23T19:02:34Z

sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala

  }
+
+  test("debugCodegenStringSeq") {
+    val res = codegenStringSeq(spark.range(10).groupBy("id").count().queryExecution.executedPlan)


Could you please post the return value here?

gatorsmile · 2017-06-23T19:03:12Z

sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala

-      val (_, source) = s.doCodeGen()
-      output += s"${CodeFormatter.format(source)}\n"
+    codegenSubtrees.toSeq.map {
+      subtree =>


Please collapse the above two lines into a single line

gatorsmile · 2017-06-23T19:03:52Z

sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala

    // scalastyle:on println
  }

+  /** Generate codegen debug info */


Please add the same function description like what you did in codegenStringSeq

gatorsmile · 2017-06-23T19:04:05Z

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

      // scalastyle:on println
    }
+
+    /** @return Sequence of WholeStageCodegen subtrees and corresponding codegen */


The same here

SparkQA · 2017-06-23T20:16:10Z

Test build #78541 has finished for PR 18409 at commit 73ef241.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2017-06-23T22:18:50Z

Example:

codegenStringSeq(sql("select 1").queryExecution.executedPlan)

The example will return Seq[(String, String)] of length 1, containing the subtree as string and the corresponding generated code.

The subtree as string:

(*Project [1 AS 1#0]
+- Scan OneRowRelation[]

The generated code:

/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   private Object[] references;
/* 007 */   private scala.collection.Iterator[] inputs;
/* 008 */   private scala.collection.Iterator inputadapter_input;
/* 009 */   private UnsafeRow project_result;
/* 010 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder project_holder;
/* 011 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter project_rowWriter;
/* 012 */
/* 013 */   public GeneratedIterator(Object[] references) {
/* 014 */     this.references = references;
/* 015 */   }
/* 016 */
/* 017 */   public void init(int index, scala.collection.Iterator[] inputs) {
/* 018 */     partitionIndex = index;
/* 019 */     this.inputs = inputs;
/* 020 */     inputadapter_input = inputs[0];
/* 021 */     project_result = new UnsafeRow(1);
/* 022 */     project_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(project_result, 0);
/* 023 */     project_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(project_holder, 1);
/* 024 */
/* 025 */   }
/* 026 */
/* 027 */   protected void processNext() throws java.io.IOException {
/* 028 */     while (inputadapter_input.hasNext() && !stopEarly()) {
/* 029 */       InternalRow inputadapter_row = (InternalRow) inputadapter_input.next();
/* 030 */       project_rowWriter.write(0, 1);
/* 031 */       append(project_result);
/* 032 */       if (shouldStop()) return;
/* 033 */     }
/* 034 */   }
/* 035 */
/* 036 */ }

gengliangwang · 2017-06-23T22:25:09Z

@gatorsmile Thanks, I have revised the code

SparkQA · 2017-06-24T00:04:02Z

Test build #78548 has finished for PR 18409 at commit 42238b8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-06-24T00:26:33Z

Test build #78550 has finished for PR 18409 at commit a349962.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-06-24T00:28:40Z

Test build #78551 has finished for PR 18409 at commit 7878c49.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-06-24T03:40:45Z

retest this please

gatorsmile · 2017-06-24T03:40:51Z

cc @cloud-fan

SparkQA · 2017-06-24T05:15:34Z

Test build #78557 has finished for PR 18409 at commit 7878c49.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2017-06-26T17:02:19Z

retest this please

SparkQA · 2017-06-26T19:21:43Z

Test build #78641 has finished for PR 18409 at commit 7878c49.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-06-27T09:25:06Z

LGTM, merging to master!

codegen info of query plan can be very long. In debugging console / web page, it would be more readable if the subtrees and corresponding codegen are split into sequence. Example: ```java codegenStringSeq(sql("select 1").queryExecution.executedPlan) ``` The example will return Seq[(String, String)] of length 1, containing the subtree as string and the corresponding generated code. The subtree as string: > (*Project [1 AS 1#0] > +- Scan OneRowRelation[] The generated code: ```java /* 001 */ public Object generate(Object[] references) { /* 002 */ return new GeneratedIterator(references); /* 003 */ } /* 004 */ /* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator { /* 006 */ private Object[] references; /* 007 */ private scala.collection.Iterator[] inputs; /* 008 */ private scala.collection.Iterator inputadapter_input; /* 009 */ private UnsafeRow project_result; /* 010 */ private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder project_holder; /* 011 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter project_rowWriter; /* 012 */ /* 013 */ public GeneratedIterator(Object[] references) { /* 014 */ this.references = references; /* 015 */ } /* 016 */ /* 017 */ public void init(int index, scala.collection.Iterator[] inputs) { /* 018 */ partitionIndex = index; /* 019 */ this.inputs = inputs; /* 020 */ inputadapter_input = inputs[0]; /* 021 */ project_result = new UnsafeRow(1); /* 022 */ project_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(project_result, 0); /* 023 */ project_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(project_holder, 1); /* 024 */ /* 025 */ } /* 026 */ /* 027 */ protected void processNext() throws java.io.IOException { /* 028 */ while (inputadapter_input.hasNext() && !stopEarly()) { /* 029 */ InternalRow inputadapter_row = (InternalRow) inputadapter_input.next(); /* 030 */ project_rowWriter.write(0, 1); /* 031 */ append(project_result); /* 032 */ if (shouldStop()) return; /* 033 */ } /* 034 */ } /* 035 */ /* 036 */ } ``` ## What changes were proposed in this pull request? add method codegenToSeq: split codegen info of query plan into sequence ## How was this patch tested? unit test cloud-fan gatorsmile Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Wang Gengliang <ltnwgl@gmail.com> Closes apache#18409 from gengliangwang/codegen.

gengliangwang added 2 commits June 23, 2017 10:39

add codegenToSeq method: split codegen info into sequence

0bed96e

add test cases and comments

73ef241

gatorsmile reviewed Jun 23, 2017

View reviewed changes

revise code style and comments

42238b8

gengliangwang added 2 commits June 23, 2017 15:37

revise comment

a349962

revise indent

7878c49

asfgit closed this in 3cb3ccc Jun 27, 2017

[SPARK-21196] Split codegen info of query plan into sequence #18409

[SPARK-21196] Split codegen info of query plan into sequence #18409

Uh oh!

Conversation

gengliangwang commented Jun 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

gatorsmile Jun 23, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jun 23, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jun 23, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jun 23, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jun 23, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jun 23, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 23, 2017

Uh oh!

gengliangwang commented Jun 23, 2017

Uh oh!

gengliangwang commented Jun 23, 2017

Uh oh!

SparkQA commented Jun 24, 2017

Uh oh!

SparkQA commented Jun 24, 2017

Uh oh!

SparkQA commented Jun 24, 2017

Uh oh!

gatorsmile commented Jun 24, 2017

Uh oh!

gatorsmile commented Jun 24, 2017

Uh oh!

SparkQA commented Jun 24, 2017

Uh oh!

gengliangwang commented Jun 26, 2017

Uh oh!

SparkQA commented Jun 26, 2017

Uh oh!

cloud-fan commented Jun 27, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gengliangwang commented Jun 23, 2017 •

edited

Loading