-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50373] Prohibit Variant from set operations #48909
[SPARK-50373] Prohibit Variant from set operations #48909
Conversation
@gene-db @cloud-fan Can you please look at this? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harshmotw-db Thanks catching the undefined behavior!
LGTM
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, we should probably not allow these until we can think about it further. What about GROUP BY and SELECT DISTINCT, should we prohibit those too?
@dtenedor This PR prohibits |
…ariant_distinct_fix
…hub.com/harshmotw-db/spark into variant_distinct_fix
thanks, merging to master! |
### What changes were proposed in this pull request? Prior to this PR, repartition by Variant producing expressions wasn't blocked during analysis. It should be blocked because Variant equality is not defined. It is similar to [this PR](#48909) which blocked Variant from Set operations. ### Why are the changes needed? Variant equality is not defined yet and therefore shouldn't be allowed in repartitioning. ### Does this PR introduce _any_ user-facing change? Yes, prior to this PR, Variants repartition did not throw a well defined error. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49080 from harshmotw-db/harsh-motwani_data/variant_repartition. Authored-by: Harsh Motwani <harsh.motwani@databricks.com> Signed-off-by: Herman van Hovell <herman@databricks.com>
What changes were proposed in this pull request?
Prior to this PR, Variant columns could be used with set operations like
DISTINCT
,INTERSECT
andEXCEPT
. This PR prohibits this behavior since Variant is not orderable.Why are the changes needed?
Variant equality is not defined, and therefore, these operations are also undefined.
Does this PR introduce any user-facing change?
Yes, users will now no longer be able to perform set operations on variant columns.
How was this patch tested?
Unit tests
Was this patch authored or co-authored using generative AI tooling?
No