Skip to content

Conversation

@HeartSaVioR
Copy link
Contributor

@HeartSaVioR HeartSaVioR commented Sep 15, 2019

What changes were proposed in this pull request?

This patch adds new UT to ensure existing query (before Spark 3.0.0) with checkpoint doesn't break with default configuration of "includeHeaders" being introduced via SPARK-23539.

This patch also modifies existing test which checks type of columns to also check headers column as well.

Why are the changes needed?

The patch adds missing tests which guarantees backward compatibility of the change of SPARK-23539.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT passed.

@HeartSaVioR
Copy link
Contributor Author

This patch is to address my own comment which PR was merged without addressing this.
#22282 (comment)

@HeartSaVioR HeartSaVioR changed the title [SPARK-23539][SS][FOLLOWUP] Add UT to ensure existing query doesn't break with default conf of includeHeaders [SPARK-23539][SS][FOLLOWUP][TESTS] Add UT to ensure existing query doesn't break with default conf of includeHeaders Sep 15, 2019
@SparkQA
Copy link

SparkQA commented Sep 15, 2019

Test build #110604 has finished for PR 25792 at commit a3b5824.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Sep 15, 2019

Actually the other intention was let test fail when "includeHeaders" is set to "true", as we suspected from #22282 (comment), but turns out it doesn't break even the option is set to true. It's because of the mechanism how dropDuplicate works - row is stored as key, and getting value via key doesn't leverage type while comparing as both hashCode and equals of row don't use type.

Luckily it doesn't let query crash, but it would be ideal to do type check and avoid the possibility of crashing in runtime - SPARK-27237 (#24173) has been proposed to address this.

@HeartSaVioR
Copy link
Contributor Author

cc. @zsxwing @srowen as reviewers with commit privileges of #24173 , and @dongjinleekr as author of #24173

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine to me.

intercept[IllegalArgumentException] { test(minPartitions = "-1", 1, true) }
}

test("default config of includeHeader doesn't break existing query from Spark 2.4") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this test. 😄

@SparkQA
Copy link

SparkQA commented Sep 16, 2019

Test build #4868 has finished for PR 25792 at commit a3b5824.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

FYI, CI build just got passed.

@srowen srowen closed this in 88c8d5e Sep 16, 2019
@srowen
Copy link
Member

srowen commented Sep 16, 2019

Merged to master

@HeartSaVioR
Copy link
Contributor Author

Thanks all for reviewing and merging!

@HeartSaVioR HeartSaVioR deleted the SPARK-23539-FOLLOWUP branch September 16, 2019 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants