Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Nov 9, 2015

JIRA: https://issues.apache.org/jira/browse/SPARK-11593

We use catalyst converters to transfer catalyst type to and from scala type now. We should use RowEncoder to replace it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved here because they should not be included in scalastyle:off section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need manually created accessors? All of the arguments to a case class should have public methods already created for them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Let me try to use that.

@SparkQA
Copy link

SparkQA commented Nov 9, 2015

Test build #45358 has finished for PR 9565 at commit 942dad7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 9, 2015

Test build #45382 has finished for PR 9565 at commit 39f6c26.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 9, 2015

Test build #45390 has finished for PR 9565 at commit 75ffaeb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Nov 10, 2015

retest this please.

@SparkQA
Copy link

SparkQA commented Nov 10, 2015

Test build #45449 has finished for PR 9565 at commit 1e13ff9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 10, 2015

Test build #45475 has finished for PR 9565 at commit 07ff97a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Nov 10, 2015

cc @davies

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outputEncoder should be created outside of eval, or it will be too slow.

@davies
Copy link
Contributor

davies commented Nov 10, 2015

@viirya Thanks for work on this. I think it's more important to generate the code for converter in generated ScalaUDF.

BTW, the RowEncoder is new in 1.6 (experimental feature), so I'd like to only merge this into master.

@viirya
Copy link
Member Author

viirya commented Nov 10, 2015

@davies Thanks for reviewing. I will work on generated version later.

@SparkQA
Copy link

SparkQA commented Nov 10, 2015

Test build #45517 has finished for PR 9565 at commit ecf01bf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Nov 10, 2015

retest this please.

@SparkQA
Copy link

SparkQA commented Nov 10, 2015

Test build #45520 has finished for PR 9565 at commit ecf01bf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 11, 2015

Test build #45616 has finished for PR 9565 at commit 39c0b7a.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * sealed abstract class State[S]\n * sealed abstract class StateSpec[KeyType, ValueType, StateType, EmittedType] extends Serializable\n * case class StateSpecImpl[K, V, S, T](\n * sealed abstract class TrackStateDStream[KeyType, ValueType, StateType, EmittedType: ClassTag](\n * class InternalTrackStateDStream[K: ClassTag, V: ClassTag, S: ClassTag, E: ClassTag](\n * case class StateInfo[S](\n * class LimitMarker(val num: Int) extends Serializable\n

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use schemaFor to get a catalyst DataType for udf's return type. For Product type, we return a StructType now. That causes a problem in RowEncoder because RowEncoder will try to get a Row not a Product for a field of StructType. You will get a casting exception if your udf returns something like (1, 2).

The problem is a field of StructType in a Row can be a Product or a Row. I modified the getStruct method in Row to turn a Row for a Product.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to update the javadoc of Row to say that Product is also a valid value type of StructType.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

@SparkQA
Copy link

SparkQA commented Nov 11, 2015

Test build #45619 has finished for PR 9565 at commit c910e6e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * sealed abstract class State[S]\n * sealed abstract class StateSpec[KeyType, ValueType, StateType, EmittedType] extends Serializable\n * case class StateSpecImpl[K, V, S, T](\n * sealed abstract class TrackStateDStream[KeyType, ValueType, StateType, EmittedType: ClassTag](\n * class InternalTrackStateDStream[K: ClassTag, V: ClassTag, S: ClassTag, E: ClassTag](\n * case class StateInfo[S](\n * class LimitMarker(val num: Int) extends Serializable\n

@viirya
Copy link
Member Author

viirya commented Apr 7, 2016

retest this please.

@SparkQA
Copy link

SparkQA commented Apr 7, 2016

Test build #55218 has finished for PR 9565 at commit 597c971.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Apr 7, 2016

retest this please.

@SparkQA
Copy link

SparkQA commented Apr 7, 2016

Test build #55219 has started for PR 9565 at commit 597c971.

@SparkQA
Copy link

SparkQA commented Apr 7, 2016

Test build #55226 has finished for PR 9565 at commit 405e8b0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Apr 8, 2016

retest this please.

@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55266 has finished for PR 9565 at commit 648c7b2.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55300 has finished for PR 9565 at commit 648c7b2.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Apr 8, 2016

retest this please.

@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55301 has finished for PR 9565 at commit 2a0c319.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55325 has finished for PR 9565 at commit 2a0c319.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Apr 8, 2016

Finally...tests passed.

@viirya
Copy link
Member Author

viirya commented Apr 8, 2016

@davies @rxin This stays here for a while. Recently I re-visit it and fix a previous problem. Can you take a look again? Thanks!

@SparkQA
Copy link

SparkQA commented Apr 11, 2016

Test build #55501 has finished for PR 9565 at commit 884a176.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 11, 2016

Test build #55502 has finished for PR 9565 at commit 30a867e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Apr 11, 2016

hmm. We can't remove non code-generated version of ScalaUDF as any InterpretedProjection with udf will fail...

@viirya
Copy link
Member Author

viirya commented Apr 13, 2016

Too replace catalyst converter with RowEncoder in non code-generated ScalaUDF seems not doable due to runtime mirror limitation.

ping @rxin Is it good that I revert the changes of non code-generated ScalaUDF here and just merge code-generated version?

@viirya
Copy link
Member Author

viirya commented Apr 19, 2016

Close this now. Maybe revisit this in the future.

@viirya viirya closed this Apr 19, 2016
@rxin
Copy link
Contributor

rxin commented Apr 19, 2016

What's the problem with runtime mirror?

@viirya
Copy link
Member Author

viirya commented Apr 19, 2016

When using member method as udf., for example, def createTransformFunc in org.apache.spark.ml.Transformer, jenkins tests always get an exception. Looks like it can't recognize the type of this kind method.

Otherwise, it works well.

BTW, I can't reproduce that exception locally. Maybe java version matters.

@koertkuipers
Copy link
Contributor

i think this would be very helpful. the difference in behaviour of scala udfs and scala functions used in dataset transformations is a constant source of confusion for my users.

for example the lack of support for Option to declare nullable input types, and the need to use untyped Row objects in UDFs for structs are inconsistent with how things are done when Encoders are used.

@viirya viirya deleted the rowencoder-scalaudf branch December 27, 2023 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants