Skip to content

Conversation

@nsyca
Copy link
Contributor

@nsyca nsyca commented Feb 4, 2017

What changes were proposed in this pull request?

This PR adds new test cases for scalar subquery in predicate context

How was this patch tested?

The test result is compared with the result run from another SQL engine (in this case is IBM DB2). If the result are equivalent, we assume the result is correct.

@nsyca
Copy link
Contributor Author

nsyca commented Feb 4, 2017

Below are a modified version of the test cases to run on DB2 and the result from DB2, as a second source to compare to the result from Spark.
Modified test file to run on DB2
Result from DB2

FROM (SELECT c1.cv, avg(c1.cv) avg
FROM c c1
WHERE c1.ck = p.pk
GROUP BY c1.cv));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged the test cases with the new test cases and placed under the directory "scalar-subquery".

@nsyca
Copy link
Contributor Author

nsyca commented Feb 4, 2017

@dilipbiswal Could you please cross-check the results from both sources?
@gatorsmile, @hvanhovell Could you please review?

struct<t1a:string>
-- !query 24 output
val1b
val1c
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have compared the result set matched with the result from DB2.

@SparkQA
Copy link

SparkQA commented Feb 4, 2017

Test build #72337 has finished for PR 16798 at commit 092f2a5.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Feb 4, 2017

Test build #72356 has finished for PR 16798 at commit 092f2a5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

FROM t2
WHERE t2c = t1c
GROUP BY t2c)
AND t1b >= (SELECT min(t2b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. I like this indentation.
For the other examples, AND seems to be aligned with t1b at line 190.

FROM t2
WHERE t2c = t1c
GROUP BY t2c)
UNION ALL
Copy link
Member

@dongjoon-hyun dongjoon-hyun Feb 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have another test case for UNION here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun Thank you for your comment. I have added another test case using UNION DISTINCT.

I would appreciate if you could share your insight on what you think the UNION test case will process differently from the UNION ALL test case with respect to the testing of scalar subquery.

Note that the correctness of the result of UNION DISTINCT can be inferred from applying the "uniqueness" operator on the result of the existing UNION test case.

@SparkQA
Copy link

SparkQA commented Feb 5, 2017

Test build #72416 has finished for PR 16798 at commit 044d6a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

LGTM - merging to master

@asfgit asfgit closed this in 5ad10c5 Feb 15, 2017
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 16, 2017
…f 2) - scalar subquery in predicate context

## What changes were proposed in this pull request?
This PR adds new test cases for scalar subquery in predicate context

## How was this patch tested?
The test result is compared with the result run from another SQL engine (in this case is IBM DB2). If the result are equivalent, we assume the result is correct.

Author: Nattavut Sutyanyong <nsy.can@gmail.com>

Closes apache#16798 from nsyca/18873-2.
asfgit pushed a commit that referenced this pull request Mar 14, 2017
…ll up to Optimizer phase

## What changes were proposed in this pull request?
Currently Analyzer as part of ResolveSubquery, pulls up the correlated predicates to its
originating SubqueryExpression. The subquery plan is then transformed to remove the correlated
predicates after they are moved up to the outer plan. In this PR, the task of pulling up
correlated predicates is deferred to Optimizer. This is the initial work that will allow us to
support the form of correlated subqueries that we don't support today. The design document
from nsyca can be found in the following link :
[DesignDoc](https://docs.google.com/document/d/1QDZ8JwU63RwGFS6KVF54Rjj9ZJyK33d49ZWbjFBaIgU/edit#)

The brief description of code changes (hopefully to aid with code review) can be be found in the
following link:
[CodeChanges](https://docs.google.com/document/d/18mqjhL9V1An-tNta7aVE13HkALRZ5GZ24AATA-Vqqf0/edit#)

## How was this patch tested?
The test case PRs were submitted earlier using.
[16337](#16337) [16759](#16759) [16841](#16841) [16915](#16915) [16798](#16798) [16712](#16712) [16710](#16710) [16760](#16760) [16802](#16802)

Author: Dilip Biswal <dbiswal@us.ibm.com>

Closes #16954 from dilipbiswal/SPARK-18874.
@nsyca nsyca deleted the 18873-2 branch March 14, 2017 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants