-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQuerySuite #20343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #86444 has finished for PR 20343 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just add the queries that are different from the v1.4 version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu . It's great to have v2.7.
Could you check the schema too?
For example, we had better update the following in the schema?
- |`web_country` STRING, `web_gmt_offset` STRING, `web_tax_percentage` DECIMAL(5,2))
+ |`web_country` STRING, `web_gmt_offset` DECIMAL(5,2), `web_tax_percentage` DECIMAL(5,2))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, good catch. Why we use a string for web_gmt_offset? It's just a bug? Anyway, I'll check all the schema again. Should we include this fix in this pr, or follow-up? @gatorsmile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. @maropu .
While reviewing this, I found that we missed that bug at the original PR of @gatorsmile .
If the fix is able to be included in Apache Spark 2.3, I think the followup PR also sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only related to test cases. Thus, it is fine Spark 2.3 release does not have it. You can do it in this PR.
Actually, this PR can be merged as long as we can fix all the issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
Test build #86448 has finished for PR 20343 at commit
|
|
retest this please. |
|
Test build #86452 has finished for PR 20343 at commit
|
|
Test build #86457 has finished for PR 20343 at commit
|
|
Test build #86462 has finished for PR 20343 at commit
|
|
retest this please. |
|
Test build #86465 has finished for PR 20343 at commit
|
|
We need to update |
| } | ||
| } | ||
|
|
||
| val tpcdsQueriesV2_7_0 = Seq( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment to explain what the list contains?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| |`web_street_name` STRING, `web_street_type` STRING, `web_suite_number` STRING, | ||
| |`web_city` STRING, `web_county` STRING, `web_state` STRING, `web_zip` STRING, | ||
| |`web_country` STRING, `web_gmt_offset` STRING, `web_tax_percentage` DECIMAL(5,2)) | ||
| |`web_country` STRING, `web_gmt_offset` DECIMAL(5,2), `web_tax_percentage` DECIMAL(5,2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked the schema changes made in this PR is consistent with the TPC-DS doc http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v2.1.0.pdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you a lot, @maropu and @gatorsmile !
| t_s_secyear.customer_id | ||
| ,t_s_secyear.customer_first_name | ||
| ,t_s_secyear.customer_last_name | ||
| ,t_s_secyear.customer_email_address |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we highlight the changes we made in version 2.7 compared with the original version by adding the comments like -- ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, could you also use the original SQL query files with the same styles/indents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll do as much as possible. one question; I think the sql format is not consistent in all the files, e.g., SQLQueryTestSuite. Probably, I think uppercase letters for SQL reserved words and 2 indents seem to be de-facto, but we don't have any format rule for that anywhere, right? we'd better to write the rule somewhere? We don't need to re-format existing code though, IMHO we'd better to make them consistent in future prs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding a keywords capitalization rule, this is just for readability. We do not enforce it, but it is preferred.
|
Regarding the updates of |
|
Regarding the change you made in |
|
ok, I'll try and check. Just s sec. |
|
Test build #86501 has finished for PR 20343 at commit
|
|
I opened a new pr to support |
|
I checked all the queries again and I found that some queries (q6, q11, q20, q22, q24, q34, q35, q47, q49, q57, q64, q72, q74, q75, q78, q98) only have minor changes (See the comments to point out the changes). So, how about directly applying these changes in |
|
Test build #86800 has finished for PR 20343 at commit
|
|
retest this please. |
|
Test build #86812 has finished for PR 20343 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
|
@maropu Yeah. As long as the queries are different, we should keep both versions. This is to help the others understand we fully support TPC-DS queries without the changes. Thanks! |
|
Thanks for submitting the PR #20433. It sounds like there are still some test failure. Will review it after 2.3 release. Thanks! |
|
ok, I'll fix soon. Many Thanks! |
|
ping @gatorsmile |
|
We need to review #20433 first |
|
Test build #88321 has finished for PR 20343 at commit
|
|
A support of the optional intervals will be planed in 3.x (#20433), so is it okay to restart this again? @gatorsmile |
|
retest this please |
|
Test build #88569 has finished for PR 20343 at commit
|
|
LGTM Thanks! Merged to master |

What changes were proposed in this pull request?
This pr added TPCDS v2.7 (latest) queries in
TPCDSQuerySuitebecause the currentTPCDSQuerySuitetests older one (v1.4) and some queries are different from v1.4 and v2.7. Since the original v2.7 queries have the syntaxes that Spark cannot parse, I changed these queries in a following way:INTERVAL14 daysHow was this patch tested?
Added tests in
TPCDSQuerySuite.