Skip to content

Commit 63025e9

Browse files
c27kwancloud-fan
authored andcommitted
[SPARK-40315][SQL] Add hashCode() for Literal of ArrayBasedMapData
### What changes were proposed in this pull request? There is no explicit `hashCode()` function override for `ArrayBasedMapData`. As a result, there is a non-deterministic error where the `hashCode()` computed for `Literal`s of `ArrayBasedMapData` can be different for two equal objects (`Literal`s of `ArrayBasedMapData` with equal keys and values). In this PR, we add a `hashCode` function so that it works exactly as we expect. ### Why are the changes needed? This is a bug fix for a non-deterministic error. It is also more consistent with the rest of Spark if we implement the `hashCode` method instead of relying on defaults. We can't add the `hashCode` directly to `ArrayBasedMapData` because of SPARK-9415. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A simple unit test was added. Closes #37807 from c27kwan/SPARK-40315-lit. Authored-by: Carmen Kwan <carmen.kwan@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e85a4ff) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
1 parent 6b021fe commit 63025e9

File tree

2 files changed

+29
-0
lines changed

2 files changed

+29
-0
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -309,6 +309,9 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression {
309309
val valueHashCode = value match {
310310
case null => 0
311311
case binary: Array[Byte] => util.Arrays.hashCode(binary)
312+
// SPARK-40315: Literals of ArrayBasedMapData should have deterministic hashCode.
313+
case arrayBasedMapData: ArrayBasedMapData =>
314+
arrayBasedMapData.keyArray.hashCode() * 37 + arrayBasedMapData.valueArray.hashCode()
312315
case other => other.hashCode()
313316
}
314317
31 * Objects.hashCode(dataType) + valueHashCode

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ComplexTypeSuite.scala

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -526,4 +526,30 @@ class ComplexTypeSuite extends SparkFunSuite with ExpressionEvalHelper {
526526

527527
assert(m1.semanticEquals(m2))
528528
}
529+
530+
test("SPARK-40315: Literals of ArrayBasedMapData should have deterministic hashCode.") {
531+
val keys = new Array[UTF8String](1)
532+
val values1 = new Array[UTF8String](1)
533+
val values2 = new Array[UTF8String](1)
534+
535+
keys(0) = UTF8String.fromString("key")
536+
values1(0) = UTF8String.fromString("value1")
537+
values2(0) = UTF8String.fromString("value2")
538+
539+
val d1 = new ArrayBasedMapData(new GenericArrayData(keys), new GenericArrayData(values1))
540+
val d2 = new ArrayBasedMapData(new GenericArrayData(keys), new GenericArrayData(values1))
541+
val d3 = new ArrayBasedMapData(new GenericArrayData(keys), new GenericArrayData(values2))
542+
val m1 = Literal.create(d1, MapType(StringType, StringType))
543+
val m2 = Literal.create(d2, MapType(StringType, StringType))
544+
val m3 = Literal.create(d3, MapType(StringType, StringType))
545+
546+
// If two Literals of ArrayBasedMapData have the same elements, we expect them to be equal and
547+
// to have the same hashCode().
548+
assert(m1 == m2)
549+
assert(m1.hashCode() == m2.hashCode())
550+
// If two Literals of ArrayBasedMapData have different elements, we expect them not to be equal
551+
// and to have different hashCode().
552+
assert(m1 != m3)
553+
assert(m1.hashCode() != m3.hashCode())
554+
}
529555
}

0 commit comments

Comments
 (0)