-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22787] [TEST] [SQL] Add a TPC-H query suite #19982
Conversation
private def readToUnsafeMem( | ||
conf: Broadcast[SerializableConfiguration], | ||
requiredSchema: StructType, | ||
wholeTextMode: Boolean): (PartitionedFile) => Iterator[UnsafeRow] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a style fix of a PR I merged today.
Test build #84927 has finished for PR 19982 at commit
|
-- using default substitutions | ||
|
||
select | ||
l_returnflag, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use space instead of tab
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the generated SQL. I plan to keep them unchanged.
Test build #84928 has finished for PR 19982 at commit
|
"q1", "q2", "q3", "q4", "q5", "q6", "q7", "q8", "q9", "q10", "q11", | ||
"q12", "q13", "q14", "q15", "q16", "q17", "q18", "q19", "q20", "q21", "q22") | ||
|
||
private def checkGeneratedCode(plan: SparkPlan): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making a base trait to remove duplicate code between this and TPCDSQuerySuite
?
LGTM for one minor comment. |
yea. |
|
LGTM |
/** | ||
* Drop all the tables | ||
*/ | ||
protected override def afterAll(): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As same as TPCDSQuerySuite
, we can use BenchmarkQueryTest.afterAll
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea we don't need to overwrite it here.
LGTM with one minor comment. |
protected override def afterAll(): Unit = { | ||
try { | ||
// For debugging dump some statistics about how much time was spent in various optimizer rules | ||
logWarning(RuleExecutor.dumpTimeSpent()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why warning for debug uses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is that a bad idea to dump the time for each query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do use logWarning, the messages will not be shown in the test log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just to give the overall picture how long each rule takes.
I plan to submit another PR to track which rule takes an effect for a specific query and also record the time cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, ok
Test build #84936 has finished for PR 19982 at commit
|
cc @maropu Feel free to submit a PR for adding SSB |
yea, sure. I have much bandwidth now :) |
Test build #84940 has finished for PR 19982 at commit
|
Thanks! Merged to master. |
@gatorsmile Any progress on this? #19982 (comment) |
@maropu Thanks for your contribution. It looks over engineering. We do not need such complicated solutions for this simple use case. We just need to record them in the log. We are also proposing new APIs for our logs. @jiangxb1987 is working on the design. |
ok, I just look forward to the proposal. |
What changes were proposed in this pull request?
Add a test suite to ensure all the TPC-H queries can be successfully analyzed, optimized and compiled without hitting the max iteration threshold.
How was this patch tested?
N/A