-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-22705][SQL] Case, Coalesce, and In use less global variables #19901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #84497 has finished for PR 19901 at commit
|
|
Test build #84504 has finished for PR 19901 at commit
|
|
Test build #84505 has finished for PR 19901 at commit
|
|
@cloud-fan @viirya @mgaido91 could you please review this? |
| val conditionMet = ctx.freshName("caseWhenConditionMet") | ||
| ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull) | ||
| ctx.addMutableState(ctx.javaType(dataType), ev.value) | ||
| val value = ctx.freshName("value") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
caseWhenTmpResult?
| // It is initialized to `false` and it is set to `true` when the first condition which | ||
| // evaluates to `true` is met and therefore is not needed to go on anymore on the computation | ||
| // This variable represents whether the condition is met with true/false or not met. | ||
| // It is initialized to -1 and it is set to 0 or 1 when the condition which evaluates to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable represents whether the evaluated result is null or not. It's a byte value instead
of boolean because it carries an extra information about if the case-when condition is met or not.
It is initialized to `-1`, which means the condition is not met yet and the result is unknown.
When the first condition is met, it is set to `1` if result is null, or `0` if result is not null.
We won't go on anymore on the computation if it's set to `1` or `0`.
| // It is initialized to -1 and it is set to 0 or 1 when the condition which evaluates to | ||
| // `false` or `true` is met and therefore is not needed to go on anymore on the computation | ||
| // of the following conditions. | ||
| val conditionMet = ctx.freshName("caseWhenConditionMet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: resultIsNull
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| """ | ||
| }, | ||
| foldFunctions = { funcCalls => | ||
| funcCalls.map { funcCall => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you keep the previous code style?
| | $func | ||
| |} while (false); | ||
| |return $conditionMet; | ||
| """.stripMargin, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this multiline style is now the preferred one.
| override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { | ||
| ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull) | ||
| ctx.addMutableState(ctx.javaType(dataType), ev.value) | ||
| val isNull = ctx.freshName("isNull") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coalesceTmpIsNull
| ${ev.isNull} = true; | ||
| ${ev.value} = ${ctx.defaultValue(dataType)}; | ||
| ${ctx.JAVA_BOOLEAN} $conditionMet = false; | ||
| ${ctx.JAVA_BYTE} $conditionMet = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why turn it from boolean to byte?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm.
| """.stripMargin | ||
| }.mkString) | ||
|
|
||
| extraArguments = (ctx.javaType(dataType), ev.value) :: Nil, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to pass it as a parameter, just create a local variable at the beginning of the generated method:
${ctx.javaType(dataType)} ${ev.value} = default...
do {
$func
} while (false);
return ${ev.value};
| """.stripMargin | ||
| }, | ||
| foldFunctions = { funcCalls => | ||
| funcCalls.map { funcCall => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, please keep the previous code style.
| ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.value) | ||
| ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull) | ||
| // inValue -1:isNull, 0:false, 1:true | ||
| val inValue = ctx.freshName("value") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: inTmpResult?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// inValue -1 indicates at lease one expr in list is evaluated to null. 0 means no matches found. 1 means the expr in list matches the given value expr.
| $codes | ||
| } while (false);""") | ||
| } while (false); | ||
| boolean ${ev.isNull} = ($conditionMet != 0); // TRUE if -1 or 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks weird that we have two possible values for TRUE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. nvm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe // TRUE if any condition is met and the result is not null, or no any condition is met.
| |if (!${ev.isNull}) { | ||
| |byte $inValue = (byte)(${valueGen.isNull} ? -1 : 0); // isNull or FALSE | ||
| |if ($inValue != -1) { | ||
| | $javaDataType $valueArg = ${valueGen.value}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just pass ${valueGen.value}? Why we need another local variable for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
${valueGen.value} may have an constant 1.0D. It is not easy to pass it to an argument by using spilitExpressions().
|
Test build #84534 has finished for PR 19901 at commit
|
|
retest this please. |
| """.stripMargin | ||
| }, | ||
| foldFunctions = { funcCalls => | ||
| funcCalls.map { funcCall => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you keep the previous code style? i.e. _.map { funcCall =>
| """.stripMargin | ||
| }.mkString) | ||
| returnType = ctx.JAVA_BYTE, | ||
| makeSplitFunction = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, makeSplitFunction = func =>
| |${ev.value} = false; | ||
| |${ev.isNull} = ${valueGen.isNull}; | ||
| |if (!${ev.isNull}) { | ||
| |// TRUE if any condition is met and the result is not null, or no any condition is met. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggested comment I think is for CaseWhen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh , sorry for stupid mistake.
| | $codes | ||
| |} while (false); | ||
| |boolean ${ev.isNull} = ($resultIsNull != 0); // TRUE if -1 or 1 | ||
| |${ctx.javaType(dataType)} ${ev.value} = $tmpResult; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: final
| override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { | ||
| ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull) | ||
| ctx.addMutableState(ctx.javaType(dataType), ev.value) | ||
| val coalesceTmpIsNull = ctx.freshName("coalesceTmpIsNull") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: val tmpIsNull = ctx.freshName("coalesceTmpIsNull"), to be consistent with https://github.com/apache/spark/pull/19901/files#diff-a966ee88604a834221e82916ec051d7dR190
|
|
||
| returnType = ctx.javaType(dataType), | ||
| makeSplitFunction = { | ||
| func => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, keep previous code style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it about makeSplitFunction = func =>. I thought about multline style. Sorry for my misunderstanding.
| |do { | ||
| | $codes | ||
| |} while (false); | ||
| |boolean ${ev.isNull} = $coalesceTmpIsNull; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: final
| ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull) | ||
| // inTmpResult -1 indicates at lease one expr in list is evaluated to null. | ||
| // 0 means no matches found. 1 means the expr in list matches the given value expr. | ||
| val inTmpResult = ctx.freshName("inTmpResult") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: val tmpResult = ctx.freshName("inTmpResult")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also here I think there is no need for this and we can use the same approach used in coalesce.
| val conditionMet = ctx.freshName("caseWhenConditionMet") | ||
| ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull) | ||
| ctx.addMutableState(ctx.javaType(dataType), ev.value) | ||
| // This variable represents whether the evaluated result is null or not. It's a byte value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also update the comment:
This variable holds the state of the result, and has 3 possible values:
-1 means the condition is not met yet and the result is unknown.
0 means the condition is met and result is not null.
1 means the condition is met and result is null.
It is initialized to `-1`, and if it's set to `1` or `0`, We won't go on anymore on the computation.
| // This generates code like: | ||
| // conditionMet = caseWhen_1(i); | ||
| // if(conditionMet) { | ||
| // caseWhenResultIsNull = caseWhen_1(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these need to be updated
| val listGen = list.map(_.genCode(ctx)) | ||
| ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.value) | ||
| ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull) | ||
| // inTmpResult -1 indicates at lease one expr in list is evaluated to null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be more clear:
inTmpResult has 3 possible values:
-1 means no matches found and there is at least one value in the list evaluated to null
0 means no matches found and all values in the list are not null
1 means one value in the list is matched, and the value is saved to `tmpResult`.
| // The evaluation of variables can be stopped when we find a matching value. | ||
| val listCode = listGen.map(x => | ||
| s""" | ||
| |$tmpResult = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, this against what we've discussed before...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this is my mistake for debugging.
| |${ev.isNull} = ${valueGen.isNull}; | ||
| |if (!${ev.isNull}) { | ||
| |byte $tmpResult = (byte)(${valueGen.isNull} ? -1 : 0); | ||
| |if ($tmpResult != -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it's more readable to write
if (!{ev.isNull}) {
byte $tmpResult = 0;
...
}
|
LGTM except some minor comments |
|
LGTM too |
|
Test build #84557 has finished for PR 19901 at commit
|
|
Test build #84561 has finished for PR 19901 at commit
|
| // inTmpResult has 3 possible values: | ||
| // -1 means no matches found and there is at least one value in the list evaluated to null | ||
| // 0 means no matches found and all values in the list are not null | ||
| // 1 means one value in the list is matched |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we define some constants with meaningful names in doGenCode?
// -1 means no matches found and there is at least one value in the list evaluated
final val HAS_NULL = -1
// 0 means no matches found and all values in the list are not null
final val NOT_MATCHED = 0
// 1 means one value in the list is matched
final val MATCHED = 1| // This variable holds the state of the result: | ||
| // -1 means the condition is not met yet and the result is unknown. | ||
| // 0 means the condition is met and result is not null. | ||
| // 1 means the condition is met and result is null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same here.
| |${x.code} | ||
| |if (${x.isNull}) { | ||
| | ${ev.isNull} = true; | ||
| | $tmpResult = -1; // isNull = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about updating the comments to
// ${ev.isNull} = true;
| |} else if (${ctx.genEqual(value.dataType, valueArg, x.value)}) { | ||
| | ${ev.isNull} = false; | ||
| | ${ev.value} = true; | ||
| | $tmpResult = 1; // value = TRUE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about updating the comments to
// ${ev.isNull} = false; ${ev.value} = true;
|
LGTM except a few comments about code readability |
| val HAS_NONNULL = 0 | ||
| // 1 means the condition is met and result is null. | ||
| val HAS_NULL = 1 | ||
| // It is initialized to `-1`, and if it's set to `1` or `0`, We won't go on anymore on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since you define the constants above, why not use them in this comment?
It is initialized to `NOT_MATCHED`, and if it's set to `HAS_NULL` or `HAS_NONNULL`...
| |do { | ||
| | $codes | ||
| |} while (false); | ||
| |// TRUE if any condition is met and the result is not null, or no any condition is met. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems wrong here. TRUE if any condition is met and the result is null...
| ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.value) | ||
| ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull) | ||
| // inTmpResult has 3 possible values: | ||
| // -1 means no matches found and there is at least one value in the list evaluated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... evaluated to null
| |${x.code} | ||
| |if (${x.isNull}) { | ||
| | ${ev.isNull} = true; | ||
| | $tmpResult = $HAS_NULL; // ${ev.isNull} = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// ${ev.isNull} = true; looks not necessary comment for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, seems suggested by other reviewer. Then fine for me.
| |} else if (${ctx.genEqual(value.dataType, valueArg, x.value)}) { | ||
| | ${ev.isNull} = false; | ||
| | ${ev.value} = true; | ||
| | $tmpResult = $MATCHED; // ${ev.isNull} = false; ${ev.value} = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
|
Few minor comments for code comment, LGTM. |
|
Test build #84584 has finished for PR 19901 at commit
|
|
Test build #84588 has finished for PR 19901 at commit
|
| val conditionMet = ctx.freshName("caseWhenConditionMet") | ||
| ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull) | ||
| ctx.addMutableState(ctx.javaType(dataType), ev.value) | ||
| // This variable holds the state of the result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resultState holds the state of the result, which has 3 possible values:
|
thanks, merging to master! you can address the minor comment in your next PR. |
What changes were proposed in this pull request?
This PR accomplishes the following two items.
CaseandCoalesceand remove global variables for generated code ofIn.Item 1. reduces # of constant pool entries in a Java class. Item 2. ensures that an variable is not passed to arguments in a method split by
CodegenContext.splitExpressions(), which is addressed by #19865.How was this patch tested?
Added new tests into
PredicateSuite,NullExpressionsSuite, andConditionalExpressionSuite.