Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Nov 23, 2017

What changes were proposed in this pull request?

When I played with codegen in developing another PR, I found the value of CodegenContext.INPUT_ROW is not reliable. Under wholestage codegen, it is assigned to null first and then suddenly changed to i.

The reason is GenerateOrdering changes CodegenContext.INPUT_ROW but doesn't restore it back.

How was this patch tested?

Added test.

@viirya
Copy link
Member Author

viirya commented Nov 23, 2017

cc @cloud-fan @kiszk

}
}

test("SPARK-22591: GenerateOrdering shouldn't change ctx.INPUT_ROW") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it cause any bugs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working to support wholestage codegen when generating expression codes safe from 64k limit. When there is not INPUT_ROW in context but we wrongly set a INPUT_ROW value, a non-existing InternalRow i will be added into function parameters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we use INPUT_ROW a lot in codegen framework, it is risky to change its value without restoring it.

def genComparisons(ctx: CodegenContext, ordering: Seq[SortOrder]): String = {
val oldInputRow = ctx.INPUT_ROW
val comparisons = ordering.map { order =>
val oldCurrentVars = ctx.currentVars
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we can also move this our of the loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Contributor

@cloud-fan cloud-fan Nov 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

val oldInputRow = ...
val oldCurrentVars = ...
val inputRow = "i"
ctx.INPUT_ROW = inputRow
ctx.currentVars = null

val comparisons = ...
...
ctx.INPUT_ROW = oldInputRow
ctx.currentVars = oldCurrentVars
...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Looks good.

@SparkQA
Copy link

SparkQA commented Nov 23, 2017

Test build #84128 has finished for PR 19800 at commit 121d309.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Nov 23, 2017

This test org.apache.spark.sql.catalyst.expressions.CastSuite.SPARK-22500: cast for struct should not generate codes beyond 64KB seems flaky? I saw it fails with java.lang.OutOfMemoryError: GC overhead limit exceeded in other PR too.

@SparkQA
Copy link

SparkQA commented Nov 23, 2017

Test build #84131 has finished for PR 19800 at commit 27145ae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// Restore original currentVars and INPUT_ROW.
ctx.currentVars = oldCurrentVars
ctx.INPUT_ROW = oldInputRow
finalCode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx.currentVars = oldCurrentVars
ctx.INPUT_ROW = oldInputRow
s"""
  |InternalRow $inputRow = null;
  |$code
 """.stripMargin

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks.

@cloud-fan
Copy link
Contributor

I created https://issues.apache.org/jira/browse/SPARK-22595 to track the flaky test

@SparkQA
Copy link

SparkQA commented Nov 24, 2017

Test build #84144 has finished for PR 19800 at commit 7462a55.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/2.2!

@asfgit asfgit closed this in 62a826f Nov 24, 2017
@viirya viirya deleted the SPARK-22591 branch December 27, 2023 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants