-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23160][SQL][TEST] Port window.sql #24881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Question: should I check the |
|
ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making a PR, @DylanGuedes .
- You don't need to avoid the test coverage duplication . The purpose of this porting is to ensure Apache Spark's capability.
- For the error and different results, please file Apache Spark JIRA issues after checking the duplications.
cc @gatorsmile
|
|
||
| SELECT depname, empno, salary, sum(salary) OVER w FROM empsalary WINDOW w AS (PARTITION BY depname); | ||
|
|
||
| -- I get an error when trying `order by rank() over w`, however it works for `order by r' if column rank is renamed to r |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep the following original query as a comment. And file a JIRA.
SELECT depname, empno, salary, rank() OVER w FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary) ORDER BY rank() OVER w;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
|
||
| SELECT ntile(3) OVER (ORDER BY ten, four), ten, four FROM tenk1 WHERE unique2 < 10; | ||
|
|
||
| -- Spark does not accept null as input for `ntile` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please file an Apache Spark JIRA issue if it doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
Test build #106545 has finished for PR 24881 at commit
|
| SELECT i,SUM(v) OVER (ORDER BY i ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) | ||
| FROM (VALUES(1,1),(2,2),(3,3),(4,4)) t(i,v); | ||
|
|
||
| -- bool_and? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add SPARK-27880 here.
|
Hey guys, I added new references to JIRAs. I removed the WIP tag so that the CI could run because I wanted to be sure that I wasn't breaking other things - and I was lmao. I forgot to drop the tables at the end of the file. However, I still need to pay attention to a few things to finish this. For instance, postgres have tons of tests with UDF+Window, but I don't know if we should also because I don't think that we can create UDFs in SQL. Btw, a question: |
|
Test build #106556 has finished for PR 24881 at commit
|
|
Test build #106561 has finished for PR 24881 at commit
|
|
Btw, another question: the CI isn't passing and I can't identify which query causes it: https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106561/testReport/junit/org.apache.spark.sql/SQLQueryTestSuite/sql/ |
|
Test build #106563 has finished for PR 24881 at commit
|
|
@DylanGuedes Please re-generate golden file by: SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z window.sql"You can verify it by: build/sbt "sql/test-only *SQLQueryTestSuite -- -z window.sql" |
|
Hmm I was always generating them, I think that at the end I made a minor change and forgot to rerun then because I just changed a comment. Whatever, thank you! |
|
I updated with a few changes. I just got noticed (after @wangyum help) that I was having problems in |
|
Test build #106596 has finished for PR 24881 at commit
|
|
Retest this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, please regenerate this test file.
This output is wrong.
|
The failure is due to the wrong generated output. Please configure your Python environment and regenerate this. |
|
Test build #106687 has finished for PR 24881 at commit
|
|
You are right, I had an environment variable that was setting ipython instead. Whatever, I regenerated the golden files, looks fine now. |
|
Test build #106722 has finished for PR 24881 at commit
|
|
By the way, the CI looks fine now. |
|
Hi, @DylanGuedes . Could you compare the result with PostgreSQL. Is the output correct? |
|
@dongjoon-hyun You are absolutely right - for some reason, results from two queries were not being truncated. I commented them out and will probably create a JIRA for them, but for now I just commented they out. |
|
Test build #107284 has finished for PR 24881 at commit
|
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
de9361c to
826708d
Compare
|
Test build #109122 has finished for PR 24881 at commit
|
|
Yea, I'll check now. |
maropu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think that this pr is too big in a sing pr. So, how about splitting into smaller prs along with the aggregate tests? WDYT? @dongjoon-hyun @wangyum
| -- !query 145 schema | ||
| struct<> | ||
| -- !query 145 output | ||
| org.apache.spark.sql.AnalysisException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because Spark does not handle 'NaN' for inline tables.
| SUM(b) OVER(ORDER BY A ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) | ||
| FROM (VALUES(1,1),(2,2),(3,'NaN'),(4,3),(5,4)) t(a,b); | ||
|
|
||
| select f_float4, sum(f_float4) over (order by f_float8 rows between 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where this test comes from? https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/window.sql#L1249
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol, I don't know what happened there. Whatever, it is fixed now.
| FROM (VALUES(1,1.5),(2,2.5),(3,NULL),(4,NULL)) t(i,v); | ||
|
|
||
| -- [SPARK-28602] Spark does not recognize 'interval' type as 'numeric' | ||
| -- SELECT i,AVG(cast(v as interval)) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz keep the original query: https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/window.sql#L1133
| -- WINDOW wnd AS (ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) | ||
| -- ORDER BY i; | ||
|
|
||
| -- SELECT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe a comment-out reason for each query where possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| struct<> | ||
| -- !query 104 output | ||
| org.apache.spark.sql.AnalysisException | ||
| Undefined function: 'range'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error is not expected one: https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/window.out#L2982
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the message is misleading, I think that they represent the same: there's no existent range aggregated function. I should JIRA that?
|
@DylanGuedes any update? |
|
@gatorsmile @maropu Sorry guys, altough I`ve got noticed about some of your comments, I didn't get any notification about the new suggestions to the merge, so I thought that your guys were discussing about splitting the merge or not. I'll try to finish all the suggestions til next week. Also, if you guys preffer a splitted merge, I can work on it, for sure. |
|
ping @dongjoon-hyun @wangyum |
| -- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group | ||
| -- | ||
| -- Window Functions Testing | ||
| -- https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/window.sql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
REL_12_BETA2 -> REL_12_BETA3. Let's use REL_12_BETA3.
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
|
Test build #110280 has finished for PR 24881 at commit
|
|
ok to test |
|
This seems the last ticket in PostgreSQL umbrella. Let's get this done. |
|
+1 for the split.... |
|
@DylanGuedes Are you still here? Can you split this pr (~1400 lines) into 4 parts (each file has 300-400 lines) by referring |
|
@maropu Yes, I'm following everything. Yes, I can split th PR. |
|
Thanks! |
|
Test build #110683 has finished for PR 24881 at commit
|
|
Close this pr cuz I saw your split prs. |
What changes were proposed in this pull request?
This PR ports window.sql from PostgreSQL regression tests. https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/window.sql
The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/window.out
How was this patch tested?
Pass the Jenkins.