-
Notifications
You must be signed in to change notification settings - Fork 285
Open
Labels
array expressionsdocumentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers
Description
Summary
arrays_overlap is marked as Incompatible in Comet, but the specific incompatibility is not documented. This issue tracks documenting and potentially fixing the behavior difference.
Spark Specification
According to Spark's arrays_overlap behavior:
- Returns
trueif at least one element exists in both arrays - Returns
falseif no common elements are found AND no null elements exist - Returns
nullif no common elements are found BUT null elements exist in either array (three-valued logic)
Examples:
SELECT arrays_overlap(array(1, 2, 3), array(3, 4, 5));
-- Spark returns: true
SELECT arrays_overlap(array(1, 2), array(3, 4));
-- Spark returns: false
SELECT arrays_overlap(array(1, null, 3), array(4, 5));
-- Spark returns: null (because null element exists, result is indeterminate)
SELECT arrays_overlap(array(1, null, 3), array(1, 4));
-- Spark returns: true (found common element 1)Current Comet Behavior
Comet uses DataFusion's array_has_any function. The specific null handling behavior may differ:
- DataFusion may return
falseinstead ofnullwhen no overlap is found but nulls exist
Current Tests
Looking at CometArrayExpressionSuite.scala:
checkSparkAnswerAndOperator(sql(
"SELECT arrays_overlap(array('a', null), array('b', null)) from t1 where _1 is not null"))Tests exist but the expression is marked as Incompatible, requiring allow_incompatible=true to run.
Possible Solutions
- Verify actual behavior difference - run specific test cases comparing Spark vs Comet
- Custom Rust implementation if DataFusion doesn't match Spark's three-valued null logic
- Post-processing - wrap result to check for null elements and convert false to null
Note: This issue was generated with AI assistance.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
array expressionsdocumentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers