Skip to content

Conversation

@koertkuipers
Copy link
Contributor

What changes were proposed in this pull request?

RowEncoder now respects nullability for struct fields when creating extractor expressions in the extractorsFor method.

Note that to get the correct value for nullable for the returned expression i chose to drop the If statement checking for nulls if the field has nullable=false. If this is undesired because we should defensively be checking for nulls anyhow with the If statement then that can be achieved as well, by modifying the If class, however to me that solution seems less clear/elegant.

How was this patch tested?

Added new unit test in DataFrameSuite for the bug described in the jira issue.

@SparkQA
Copy link

SparkQA commented Mar 26, 2016

Test build #54264 has finished for PR 11980 at commit b500c8b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 26, 2016

Test build #54265 has finished for PR 11980 at commit 3e32b6a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Mar 27, 2016

cc @cloud-fan

dataType: DataType,
arguments: Seq[Expression] = Nil) extends Expression with NonSQLExpression {
arguments: Seq[Expression] = Nil,
val nullable: Boolean = true) extends Expression with NonSQLExpression {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks tricky to me, invoking a method on an object should be always nullable, as we have no idea what that method does. To fix the nullability lose issue, how about we create a new expression called GetExternalRowField? We can setup corrected nullability there and generate simpler code for this case.

@koertkuipers
Copy link
Contributor Author

@cloud-fan i tried to do that, but i don't think i am familiar enough with the code gen, because it breaks other unit tests. it seems to me i am messing up with internal vs external types.
you want me to push my broken code?

@koertkuipers
Copy link
Contributor Author

@cloud-fan i pushed at attempt at this, but i am having trouble with RowEncoderSuite encode/decode: Product
this test uses a Product value with a StructType schema, and RowEncoder is supposed to handle this, but i can not figure out how this worked before.

@SparkQA
Copy link

SparkQA commented Mar 29, 2016

Test build #54457 has finished for PR 11980 at commit 50c0005.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

f.dataType
)
if (f.nullable) {
If(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do the null check inside GetExternalRowField

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i had that before (basically without the If, since GetExternalRowField already does null checks inside), but the issue is that extractorsFor code path then also runs for nulls , and that causes some weird runtime errors. i will try again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we do the null check inside GetExternalRowField then the code for serializerFor also needs to be pushed into it (to be inside the null check), and i can not figure out how to do that...

@SparkQA
Copy link

SparkQA commented Apr 3, 2016

Test build #54809 has finished for PR 11980 at commit e9a9a30.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

(This is trivial but might be better if the title follows [SPARK-XXXXX][SQL] format as described in https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark.)

asfgit pushed a commit that referenced this pull request May 5, 2016
## What changes were proposed in this pull request?

The problem is: In `RowEncoder`, we use `Invoke` to get the field of an external row, which lose the nullability information. This PR creates a `GetExternalRowField` expression, so that we can preserve the nullability info.

TODO: simplify the null handling logic in `RowEncoder`, to remove so many if branches, in follow-up PR.

## How was this patch tested?

new tests in `RowEncoderSuite`

Note that, This PR takes over #11980, with a little simplification, so all credits should go to koertkuipers

Author: Wenchen Fan <wenchen@databricks.com>
Author: Koert Kuipers <koert@tresata.com>

Closes #12364 from cloud-fan/nullable.

(cherry picked from commit 55cc1c9)
Signed-off-by: Cheng Lian <lian@databricks.com>
ghost pushed a commit to dbtsai/spark that referenced this pull request May 5, 2016
## What changes were proposed in this pull request?

The problem is: In `RowEncoder`, we use `Invoke` to get the field of an external row, which lose the nullability information. This PR creates a `GetExternalRowField` expression, so that we can preserve the nullability info.

TODO: simplify the null handling logic in `RowEncoder`, to remove so many if branches, in follow-up PR.

## How was this patch tested?

new tests in `RowEncoderSuite`

Note that, This PR takes over apache#11980, with a little simplification, so all credits should go to koertkuipers

Author: Wenchen Fan <wenchen@databricks.com>
Author: Koert Kuipers <koert@tresata.com>

Closes apache#12364 from cloud-fan/nullable.
@rxin
Copy link
Contributor

rxin commented May 20, 2016

@koertkuipers this can be closed now right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants