-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-27457][SQL] modify bean encoder to support avro objects #24367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| method.getParameterCount == 1) | ||
| if (a.getName == b.getName || | ||
| (a.getName.indexOf("get") == 0 && b.getName.indexOf("set") == 0 && | ||
| a.getName.substring(3) == b.getName.substring(3))) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit hacky here, compared with #24299
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will try to simplify
We need to check that the getter and setter functions are either prefixed with get and set respectively or both functions are not prefixed
I did this modification in #24299
I will try to make the refactor the code if possible
| encodeDecodeTest(Option("abc"), "option of string") | ||
| encodeDecodeTest(Option.empty[String], "empty option of string") | ||
|
|
||
| encodeDecodeTest( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried and the test case failed:
Can't compare maps!
org.apache.avro.AvroRuntimeException: Can't compare maps!
at org.apache.avro.generic.GenericData.compare(GenericData.java:984)
at org.apache.avro.specific.SpecificData.compare(SpecificData.java:333)
at org.apache.avro.generic.GenericData.compare(GenericData.java:961)
at org.apache.avro.specific.SpecificData.compare(SpecificData.java:333)
at org.apache.avro.generic.GenericData.compare(GenericData.java:946)
at org.apache.avro.specific.SpecificRecordBase.compareTo(SpecificRecordBase.java:81)
at org.apache.avro.specific.SpecificRecordBase.compareTo(SpecificRecordBase.java:30)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite.$anonfun$encodeDecodeTest$1(ExpressionEncoderSuite.scala:442)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is how the input and the convertedBack objects are compared
if we replace the check, in line ExpressionEncoderSuite.scala:442,
left.asInstanceOf[Comparable[Any]].compareTo(right) == 0
By
left.asInstanceOf[Comparable[Any]].equals(right) == 0
the test for Avro encoder passes, but unfortunately other tests fail
Equality of objects is tricky.
The GenericData compare function
/** Comparison implementation. When equals is true, only checks for equality,
* not for order. */
@SuppressWarnings(value="unchecked")
protected int compare(Object o1, Object o2, Schema s, boolean equals) {
fails to compare Maps when the parameter equals is false
I propose the following
replace ExpressionEncoderSuite:434:444 lines
val isCorrect = (input, convertedBack) match {
case (b1: Array[Byte], b2: Array[Byte]) => Arrays.equals(b1, b2)
case (b1: Array[Int], b2: Array[Int]) => Arrays.equals(b1, b2)
case (b1: Array[Array[_]], b2: Array[Array[_]]) =>
Arrays.deepEquals(b1.asInstanceOf[Array[AnyRef]], b2.asInstanceOf[Array[AnyRef]])
case (b1: Array[_], b2: Array[_]) =>
Arrays.equals(b1.asInstanceOf[Array[AnyRef]], b2.asInstanceOf[Array[AnyRef]])
case (left: Comparable[_], right: Comparable[_]) =>
left.asInstanceOf[Comparable[Any]].compareTo(right) == 0
case _ => input == convertedBack
}
by
val convertedBackRow = encoder.toRow(convertedBack)
val isCorrect = row == convertedBackRow
With the proposed modification all the tests in ExpressionEncoderSuite passes
I think this makes sense, the check should not depend on how the foreign objects implement equality (see next comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we must compare input and convertedBack ; the equality row==convertedBackRow does not guarantee that the encoder is correct; indeed a buggy encoder may create a row with missing fields such that row==convertedBackRow is true while input==convertedBack is false.
For the avro encoder test the test input == convertedBack is true when the encoder is correct, thus we propose the following modification:
val isCorrect = input == convertedBack || ((input, convertedBack) match {
case (b1: Array[Byte], b2: Array[Byte]) => Arrays.equals(b1, b2)
case (b1: Array[Int], b2: Array[Int]) => Arrays.equals(b1, b2)
case (b1: Array[Array[_]], b2: Array[Array[_]]) =>
Arrays.deepEquals(b1.asInstanceOf[Array[AnyRef]], b2.asInstanceOf[Array[AnyRef]])
case (b1: Array[_], b2: Array[_]) =>
Arrays.equals(b1.asInstanceOf[Array[AnyRef]], b2.asInstanceOf[Array[AnyRef]])
case (left: Comparable[_], right: Comparable[_]) =>
left.asInstanceOf[Comparable[Any]].compareTo(right) == 0
case _ => false
})
With this modification all the tests pass
The issue here is how the input and the convertedBack objects are compared if we replace the check, in line ExpressionEncoderSuite.scala:442, left.asInstanceOf[Comparable[Any]].compareTo(right) == 0 By left.asInstanceOf[Comparable[Any]].equals(right) == 0 the test for Avro encoder passes, but unfortunately other tests fail Equality of objects is tricky. The GenericData compare function /** Comparison implementation. When equals is true, only checks for equality, * not for order. */ @SuppressWarnings(value="unchecked") protected int compare(Object o1, Object o2, Schema s, boolean equals) { fails to compare Maps when the parameter equals is false I propose the following replace ExpressionEncoderSuite:434:444 lines val isCorrect = (input, convertedBack) match { case (b1: Array[Byte], b2: Array[Byte]) => Arrays.equals(b1, b2) case (b1: Array[Int], b2: Array[Int]) => Arrays.equals(b1, b2) case (b1: Array[Array[_]], b2: Array[Array[_]]) => Arrays.deepEquals(b1.asInstanceOf[Array[AnyRef]], b2.asInstanceOf[Array[AnyRef]]) case (b1: Array[_], b2: Array[_]) => Arrays.equals(b1.asInstanceOf[Array[AnyRef]], b2.asInstanceOf[Array[AnyRef]]) case (left: Comparable[_], right: Comparable[_]) => left.asInstanceOf[Comparable[Any]].compareTo(right) == 0 case _ => input == convertedBack } by val convertedBackRow = encoder.toRow(convertedBack) val isCorrect = row == convertedBackRow With the proposed modification all the tests in ExpressionEncoderSuite passes
|
👍 it's been a long time since we wanted to use dataset [T <: SpecificRecord] |
|
The following example shows why we prefer the solution that modifies ScalaReflection (#24299) In the example below, If we use bean encoder we must declare a tuple encoder as shown: if we do not declare the tuple encoder we get an error But with the ScalaReflection (#24299) modification we just need to declare an encoder for Barcode: |
|
Can one of the admins verify this patch? |
|
We're closing this PR because it hasn't been updated in a while. If you'd like to revive this PR, please reopen it! |
What changes were proposed in this pull request?
Currently we modified JavaTypeInference to be able to create encoders for Avro objects; we have now three solutions, the ones in the PR #24299, and #22878 and this PR (fewer code changes); which one is better?
How was this patch tested?
We added one test in ExpressionencoderSuite and used the following program to test it locally: