-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18873][SQL][TEST] New test cases for scalar subquery (part 2 of 2) - scalar subquery in predicate context #16798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…rrect results ## What changes were proposed in this pull request? This patch fixes the incorrect results in the rule ResolveSubquery in Catalyst's Analysis phase. ## How was this patch tested? ./dev/run-tests a new unit test on the problematic pattern.
…rrect results ## What changes were proposed in this pull request? This patch fixes the incorrect results in the rule ResolveSubquery in Catalyst's Analysis phase. ## How was this patch tested? ./dev/run-tests a new unit test on the problematic pattern.
|
Below are a modified version of the test cases to run on DB2 and the result from DB2, as a second source to compare to the result from Spark. |
| FROM (SELECT c1.cv, avg(c1.cv) avg | ||
| FROM c c1 | ||
| WHERE c1.ck = p.pk | ||
| GROUP BY c1.cv)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged the test cases with the new test cases and placed under the directory "scalar-subquery".
|
@dilipbiswal Could you please cross-check the results from both sources? |
| struct<t1a:string> | ||
| -- !query 24 output | ||
| val1b | ||
| val1c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have compared the result set matched with the result from DB2.
|
Test build #72337 has finished for PR 16798 at commit
|
|
retest this please |
|
Test build #72356 has finished for PR 16798 at commit
|
| FROM t2 | ||
| WHERE t2c = t1c | ||
| GROUP BY t2c) | ||
| AND t1b >= (SELECT min(t2b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. I like this indentation.
For the other examples, AND seems to be aligned with t1b at line 190.
| FROM t2 | ||
| WHERE t2c = t1c | ||
| GROUP BY t2c) | ||
| UNION ALL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have another test case for UNION here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun Thank you for your comment. I have added another test case using UNION DISTINCT.
I would appreciate if you could share your insight on what you think the UNION test case will process differently from the UNION ALL test case with respect to the testing of scalar subquery.
Note that the correctness of the result of UNION DISTINCT can be inferred from applying the "uniqueness" operator on the result of the existing UNION test case.
|
Test build #72416 has finished for PR 16798 at commit
|
|
LGTM - merging to master |
…f 2) - scalar subquery in predicate context ## What changes were proposed in this pull request? This PR adds new test cases for scalar subquery in predicate context ## How was this patch tested? The test result is compared with the result run from another SQL engine (in this case is IBM DB2). If the result are equivalent, we assume the result is correct. Author: Nattavut Sutyanyong <nsy.can@gmail.com> Closes apache#16798 from nsyca/18873-2.
…ll up to Optimizer phase ## What changes were proposed in this pull request? Currently Analyzer as part of ResolveSubquery, pulls up the correlated predicates to its originating SubqueryExpression. The subquery plan is then transformed to remove the correlated predicates after they are moved up to the outer plan. In this PR, the task of pulling up correlated predicates is deferred to Optimizer. This is the initial work that will allow us to support the form of correlated subqueries that we don't support today. The design document from nsyca can be found in the following link : [DesignDoc](https://docs.google.com/document/d/1QDZ8JwU63RwGFS6KVF54Rjj9ZJyK33d49ZWbjFBaIgU/edit#) The brief description of code changes (hopefully to aid with code review) can be be found in the following link: [CodeChanges](https://docs.google.com/document/d/18mqjhL9V1An-tNta7aVE13HkALRZ5GZ24AATA-Vqqf0/edit#) ## How was this patch tested? The test case PRs were submitted earlier using. [16337](#16337) [16759](#16759) [16841](#16841) [16915](#16915) [16798](#16798) [16712](#16712) [16710](#16710) [16760](#16760) [16802](#16802) Author: Dilip Biswal <dbiswal@us.ibm.com> Closes #16954 from dilipbiswal/SPARK-18874.
What changes were proposed in this pull request?
This PR adds new test cases for scalar subquery in predicate context
How was this patch tested?
The test result is compared with the result run from another SQL engine (in this case is IBM DB2). If the result are equivalent, we assume the result is correct.