Skip to content

[Incompatibility] Document arrays_overlap null handling differences #3175

@andygrove

Description

@andygrove

Summary

arrays_overlap is marked as Incompatible in Comet, but the specific incompatibility is not documented. This issue tracks documenting and potentially fixing the behavior difference.

Spark Specification

According to Spark's arrays_overlap behavior:

  • Returns true if at least one element exists in both arrays
  • Returns false if no common elements are found AND no null elements exist
  • Returns null if no common elements are found BUT null elements exist in either array (three-valued logic)

Examples:

SELECT arrays_overlap(array(1, 2, 3), array(3, 4, 5));
-- Spark returns: true

SELECT arrays_overlap(array(1, 2), array(3, 4));  
-- Spark returns: false

SELECT arrays_overlap(array(1, null, 3), array(4, 5));
-- Spark returns: null (because null element exists, result is indeterminate)

SELECT arrays_overlap(array(1, null, 3), array(1, 4));
-- Spark returns: true (found common element 1)

Current Comet Behavior

Comet uses DataFusion's array_has_any function. The specific null handling behavior may differ:

  • DataFusion may return false instead of null when no overlap is found but nulls exist

Current Tests

Looking at CometArrayExpressionSuite.scala:

checkSparkAnswerAndOperator(sql(
  "SELECT arrays_overlap(array('a', null), array('b', null)) from t1 where _1 is not null"))

Tests exist but the expression is marked as Incompatible, requiring allow_incompatible=true to run.

Possible Solutions

  1. Verify actual behavior difference - run specific test cases comparing Spark vs Comet
  2. Custom Rust implementation if DataFusion doesn't match Spark's three-valued null logic
  3. Post-processing - wrap result to check for null elements and convert false to null

Note: This issue was generated with AI assistance.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions