-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11593][SQL] Replace catalyst converter with RowEncoder in ScalaUDF #9565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved here because they should not be included in scalastyle:off section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need manually created accessors? All of the arguments to a case class should have public methods already created for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Let me try to use that.
|
Test build #45358 has finished for PR 9565 at commit
|
|
Test build #45382 has finished for PR 9565 at commit
|
|
Test build #45390 has finished for PR 9565 at commit
|
|
retest this please. |
|
Test build #45449 has finished for PR 9565 at commit
|
|
Test build #45475 has finished for PR 9565 at commit
|
|
cc @davies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputEncoder should be created outside of eval, or it will be too slow.
|
@viirya Thanks for work on this. I think it's more important to generate the code for converter in generated ScalaUDF. BTW, the RowEncoder is new in 1.6 (experimental feature), so I'd like to only merge this into master. |
|
@davies Thanks for reviewing. I will work on generated version later. |
|
Test build #45517 has finished for PR 9565 at commit
|
|
retest this please. |
|
Test build #45520 has finished for PR 9565 at commit
|
|
Test build #45616 has finished for PR 9565 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use schemaFor to get a catalyst DataType for udf's return type. For Product type, we return a StructType now. That causes a problem in RowEncoder because RowEncoder will try to get a Row not a Product for a field of StructType. You will get a casting exception if your udf returns something like (1, 2).
The problem is a field of StructType in a Row can be a Product or a Row. I modified the getStruct method in Row to turn a Row for a Product.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need to update the javadoc of Row to say that Product is also a valid value type of StructType.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
|
Test build #45619 has finished for PR 9565 at commit
|
|
retest this please. |
|
Test build #55218 has finished for PR 9565 at commit
|
|
retest this please. |
|
Test build #55219 has started for PR 9565 at commit |
|
Test build #55226 has finished for PR 9565 at commit
|
|
retest this please. |
Conflicts: project/MimaExcludes.scala
|
Test build #55266 has finished for PR 9565 at commit
|
|
Test build #55300 has finished for PR 9565 at commit
|
|
retest this please. |
|
Test build #55301 has finished for PR 9565 at commit
|
|
Test build #55325 has finished for PR 9565 at commit
|
|
Finally...tests passed. |
Conflicts: project/MimaExcludes.scala
|
Test build #55501 has finished for PR 9565 at commit
|
|
Test build #55502 has finished for PR 9565 at commit
|
|
hmm. We can't remove non code-generated version of ScalaUDF as any InterpretedProjection with udf will fail... |
|
Too replace catalyst converter with RowEncoder in non code-generated ScalaUDF seems not doable due to runtime mirror limitation. ping @rxin Is it good that I revert the changes of non code-generated ScalaUDF here and just merge code-generated version? |
|
Close this now. Maybe revisit this in the future. |
|
What's the problem with runtime mirror? |
|
When using member method as udf., for example, Otherwise, it works well. BTW, I can't reproduce that exception locally. Maybe java version matters. |
|
i think this would be very helpful. the difference in behaviour of scala udfs and scala functions used in dataset transformations is a constant source of confusion for my users. for example the lack of support for Option to declare nullable input types, and the need to use untyped Row objects in UDFs for structs are inconsistent with how things are done when Encoders are used. |
JIRA: https://issues.apache.org/jira/browse/SPARK-11593
We use catalyst converters to transfer catalyst type to and from scala type now. We should use
RowEncoderto replace it.