Skip to content

Commit e85a4ff

Browse files
c27kwancloud-fan
authored andcommitted
[SPARK-40315][SQL] Add hashCode() for Literal of ArrayBasedMapData
### What changes were proposed in this pull request? There is no explicit `hashCode()` function override for `ArrayBasedMapData`. As a result, there is a non-deterministic error where the `hashCode()` computed for `Literal`s of `ArrayBasedMapData` can be different for two equal objects (`Literal`s of `ArrayBasedMapData` with equal keys and values). In this PR, we add a `hashCode` function so that it works exactly as we expect. ### Why are the changes needed? This is a bug fix for a non-deterministic error. It is also more consistent with the rest of Spark if we implement the `hashCode` method instead of relying on defaults. We can't add the `hashCode` directly to `ArrayBasedMapData` because of SPARK-9415. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A simple unit test was added. Closes #37807 from c27kwan/SPARK-40315-lit. Authored-by: Carmen Kwan <carmen.kwan@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
1 parent 82d4430 commit e85a4ff

File tree

2 files changed

+29
-0
lines changed

2 files changed

+29
-0
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -369,6 +369,9 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression {
369369
val valueHashCode = value match {
370370
case null => 0
371371
case binary: Array[Byte] => util.Arrays.hashCode(binary)
372+
// SPARK-40315: Literals of ArrayBasedMapData should have deterministic hashCode.
373+
case arrayBasedMapData: ArrayBasedMapData =>
374+
arrayBasedMapData.keyArray.hashCode() * 37 + arrayBasedMapData.valueArray.hashCode()
372375
case other => other.hashCode()
373376
}
374377
31 * Objects.hashCode(dataType) + valueHashCode

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ComplexTypeSuite.scala

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -517,4 +517,30 @@ class ComplexTypeSuite extends SparkFunSuite with ExpressionEvalHelper {
517517

518518
assert(m1.semanticEquals(m2))
519519
}
520+
521+
test("SPARK-40315: Literals of ArrayBasedMapData should have deterministic hashCode.") {
522+
val keys = new Array[UTF8String](1)
523+
val values1 = new Array[UTF8String](1)
524+
val values2 = new Array[UTF8String](1)
525+
526+
keys(0) = UTF8String.fromString("key")
527+
values1(0) = UTF8String.fromString("value1")
528+
values2(0) = UTF8String.fromString("value2")
529+
530+
val d1 = new ArrayBasedMapData(new GenericArrayData(keys), new GenericArrayData(values1))
531+
val d2 = new ArrayBasedMapData(new GenericArrayData(keys), new GenericArrayData(values1))
532+
val d3 = new ArrayBasedMapData(new GenericArrayData(keys), new GenericArrayData(values2))
533+
val m1 = Literal.create(d1, MapType(StringType, StringType))
534+
val m2 = Literal.create(d2, MapType(StringType, StringType))
535+
val m3 = Literal.create(d3, MapType(StringType, StringType))
536+
537+
// If two Literals of ArrayBasedMapData have the same elements, we expect them to be equal and
538+
// to have the same hashCode().
539+
assert(m1 == m2)
540+
assert(m1.hashCode() == m2.hashCode())
541+
// If two Literals of ArrayBasedMapData have different elements, we expect them not to be equal
542+
// and to have different hashCode().
543+
assert(m1 != m3)
544+
assert(m1.hashCode() != m3.hashCode())
545+
}
520546
}

0 commit comments

Comments
 (0)