-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-50245][SQL][TESTS] Extended CollationSuite and added tests where SortMergeJoin is forced #48774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50245][SQL][TESTS] Extended CollationSuite and added tests where SortMergeJoin is forced #48774
Conversation
|
@MaxGekk @stefankandic can you please take a look at this PR? |
MaxGekk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also cc @uros-db
| } | ||
|
|
||
| // Disable broadcast join to force sort merge join. | ||
| withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it maybe be easier to run the tests with default and -1 values of the conf, and then just assert that different joins are used based on the conf's value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like we would avoid a lot of code duplication with this approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about smth like this
Seq("-1", "1").foreach(val =>
withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> val) {
...
i.e. iterating over possible vals, to reduce duplication
can also conditionally collect join plan nodes, based on val
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did exactly this, the asserts for SparkPlan nodes are refactored, however, the collationKey check in the plan could not be refactored like the asserts, so there are some duplications in the code, but not significant.
2f43131 to
265efcd
Compare
MaxGekk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vladanvasi-db If your PR affects tests only, please, add the [TESTS] tag to PR's title.
sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
Outdated
Show resolved
Hide resolved
|
+1, LGTM. Merging to master. |
What changes were proposed in this pull request?
I propose extending existing tests in
CollationSuiteand add cases whereSortMergeJoinis forced and tested for correctness and use ofCollationKey.Why are the changes needed?
These changes are needed to properly test behavior of join with collated data when different configs are enabled.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
The change is a test itself.
Was this patch authored or co-authored using generative AI tooling?
No.