Skip to content

Conversation

@vladanvasi-db
Copy link
Contributor

What changes were proposed in this pull request?

I propose extending existing tests in CollationSuite and add cases where SortMergeJoin is forced and tested for correctness and use of CollationKey.

Why are the changes needed?

These changes are needed to properly test behavior of join with collated data when different configs are enabled.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

The change is a test itself.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Nov 6, 2024
@vladanvasi-db
Copy link
Contributor Author

@MaxGekk @stefankandic can you please take a look at this PR?

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also cc @uros-db

}

// Disable broadcast join to force sort merge join.
withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it maybe be easier to run the tests with default and -1 values of the conf, and then just assert that different joins are used based on the conf's value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like we would avoid a lot of code duplication with this approach

Copy link
Contributor

@uros-db uros-db Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about smth like this

Seq("-1", "1").foreach(val =>
  withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> val) {
    ...

i.e. iterating over possible vals, to reduce duplication
can also conditionally collect join plan nodes, based on val

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did exactly this, the asserts for SparkPlan nodes are refactored, however, the collationKey check in the plan could not be refactored like the asserts, so there are some duplications in the code, but not significant.

@vladanvasi-db vladanvasi-db force-pushed the vladanvasi-db/collation-suite-test-extension branch from 2f43131 to 265efcd Compare November 7, 2024 13:37
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vladanvasi-db If your PR affects tests only, please, add the [TESTS] tag to PR's title.

@vladanvasi-db vladanvasi-db changed the title [SPARK-50245][SQL] Extended CollationSuite and added tests where SortMergeJoin is forced [SPARK-50245][SQL][TESTS] Extended CollationSuite and added tests where SortMergeJoin is forced Nov 8, 2024
@vladanvasi-db vladanvasi-db requested a review from MaxGekk November 8, 2024 08:18
@vladanvasi-db vladanvasi-db requested a review from MaxGekk November 8, 2024 10:26
@MaxGekk
Copy link
Member

MaxGekk commented Nov 13, 2024

+1, LGTM. Merging to master.
Thank you, @vladanvasi-db and @stefankandic @uros-db for review.

@MaxGekk MaxGekk closed this in 898bff2 Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants