Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Jan 21, 2018

What changes were proposed in this pull request?

This pr added TPCDS v2.7 (latest) queries in TPCDSQuerySuite because the current TPCDSQuerySuite tests older one (v1.4) and some queries are different from v1.4 and v2.7. Since the original v2.7 queries have the syntaxes that Spark cannot parse, I changed these queries in a following way:

  • [date] + 14 days -> date + INTERVAL 14 days
  • [column name] as "30 days" -> [column name] as `30 days`
  • Fix some syntax errors, e.g., missing brackets

How was this patch tested?

Added tests in TPCDSQuerySuite.

@SparkQA
Copy link

SparkQA commented Jan 21, 2018

Test build #86444 has finished for PR 20343 at commit 71e0e1a.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • i_class in ('personal','portable','reference','self-help') and
  • i_class in ('accessories','classical','fragrances','pants') and
  • and i_class in ('personal','portable','reference','self-help')
  • and i_class in ('accessories','classical','fragrances','pants')
  • ( select i_category ,i_class ,i_brand ,i_product_name ,d_year ,d_qoy ,d_moy ,s_store_id
  • ( select sum(ws_net_paid) as total_sum, i_category, i_class, 0 as g_category, 0 as g_class
  • i_class in ('wallpaper','parenting','musical')
  • i_class in ('womens','birdal','pants')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just add the queries that are different from the v1.4 version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member

@dongjoon-hyun dongjoon-hyun Jan 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu . It's great to have v2.7.
Could you check the schema too?
For example, we had better update the following in the schema?

-        |`web_country` STRING, `web_gmt_offset` STRING, `web_tax_percentage` DECIMAL(5,2))
+        |`web_country` STRING, `web_gmt_offset` DECIMAL(5,2), `web_tax_percentage` DECIMAL(5,2))

Copy link
Member Author

@maropu maropu Jan 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, good catch. Why we use a string for web_gmt_offset? It's just a bug? Anyway, I'll check all the schema again. Should we include this fix in this pr, or follow-up? @gatorsmile

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. @maropu .
While reviewing this, I found that we missed that bug at the original PR of @gatorsmile .
If the fix is able to be included in Apache Spark 2.3, I think the followup PR also sounds good to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only related to test cases. Thus, it is fine Spark 2.3 release does not have it. You can do it in this PR.

Actually, this PR can be merged as long as we can fix all the issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@SparkQA
Copy link

SparkQA commented Jan 21, 2018

Test build #86448 has finished for PR 20343 at commit 9ac04ed.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Jan 22, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented Jan 22, 2018

Test build #86452 has finished for PR 20343 at commit 9ac04ed.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 22, 2018

Test build #86457 has finished for PR 20343 at commit 12f687c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 22, 2018

Test build #86462 has finished for PR 20343 at commit 5d6092c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Jan 22, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented Jan 22, 2018

Test build #86465 has finished for PR 20343 at commit 5d6092c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Jan 22, 2018

We need to update TPCDSQueryBenchmark, too? I think we could replace the update queries there.

}
}

val tpcdsQueriesV2_7_0 = Seq(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment to explain what the list contains?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

|`web_street_name` STRING, `web_street_type` STRING, `web_suite_number` STRING,
|`web_city` STRING, `web_county` STRING, `web_state` STRING, `web_zip` STRING,
|`web_country` STRING, `web_gmt_offset` STRING, `web_tax_percentage` DECIMAL(5,2))
|`web_country` STRING, `web_gmt_offset` DECIMAL(5,2), `web_tax_percentage` DECIMAL(5,2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked the schema changes made in this PR is consistent with the TPC-DS doc http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v2.1.0.pdf

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you a lot, @maropu and @gatorsmile !

t_s_secyear.customer_id
,t_s_secyear.customer_first_name
,t_s_secyear.customer_last_name
,t_s_secyear.customer_email_address
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we highlight the changes we made in version 2.7 compared with the original version by adding the comments like -- ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, could you also use the original SQL query files with the same styles/indents?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

screen shot 2018-01-22 at 10 12 44 am

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll do as much as possible. one question; I think the sql format is not consistent in all the files, e.g., SQLQueryTestSuite. Probably, I think uppercase letters for SQL reserved words and 2 indents seem to be de-facto, but we don't have any format rule for that anywhere, right? we'd better to write the rule somewhere? We don't need to re-format existing code though, IMHO we'd better to make them consistent in future prs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding a keywords capitalization rule, this is just for readability. We do not enforce it, but it is preferred.

@gatorsmile
Copy link
Member

Regarding the updates of TPCDSQueryBenchmark, we can do it in a separate PR.

@gatorsmile
Copy link
Member

Regarding the change you made in [date] + 14 days -> date + INTERVAL 14 days, could we first support it before we merge this PR? It sounds like this is trivial to support, right?

@maropu
Copy link
Member Author

maropu commented Jan 23, 2018

ok, I'll try and check. Just s sec.

@SparkQA
Copy link

SparkQA commented Jan 23, 2018

Test build #86501 has finished for PR 20343 at commit fff88d2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Jan 30, 2018

I opened a new pr to support [date] + 14 days: #20433

@maropu
Copy link
Member Author

maropu commented Jan 30, 2018

I checked all the queries again and I found that some queries (q6, q11, q20, q22, q24, q34, q35, q47, q49, q57, q64, q72, q74, q75, q78, q98) only have minor changes (See the comments to point out the changes). So, how about directly applying these changes in sql/core/src/test/resources/tpcds?

@SparkQA
Copy link

SparkQA commented Jan 30, 2018

Test build #86800 has finished for PR 20343 at commit d04b087.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Jan 30, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented Jan 30, 2018

Test build #86812 has finished for PR 20343 at commit d04b087.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@gatorsmile
Copy link
Member

@maropu Yeah. As long as the queries are different, we should keep both versions. This is to help the others understand we fully support TPC-DS queries without the changes. Thanks!

@gatorsmile
Copy link
Member

Thanks for submitting the PR #20433. It sounds like there are still some test failure. Will review it after 2.3 release. Thanks!

@maropu
Copy link
Member Author

maropu commented Jan 30, 2018

ok, I'll fix soon. Many Thanks!

@kiszk
Copy link
Member

kiszk commented Feb 28, 2018

ping @gatorsmile

@maropu
Copy link
Member Author

maropu commented Mar 2, 2018

We need to review #20433 first

@SparkQA
Copy link

SparkQA commented Mar 16, 2018

Test build #88321 has finished for PR 20343 at commit d04b087.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Mar 22, 2018

A support of the optional intervals will be planed in 3.x (#20433), so is it okay to restart this again? @gatorsmile

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Mar 25, 2018

Test build #88569 has finished for PR 20343 at commit d04b087.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

gatorsmile commented Mar 25, 2018

LGTM

Thanks! Merged to master

@asfgit asfgit closed this in 5f653d4 Mar 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants