Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-770051 Fix a wrong result issue for crossing schema join/union #504

Merged
merged 4 commits into from
Apr 21, 2023

Conversation

sfc-gh-mrui
Copy link
Contributor

@sfc-gh-mrui sfc-gh-mrui commented Apr 20, 2023

SNOW-770051 Fix a wrong result issue for crossing schema join/union

The design doc is:
https://docs.google.com/document/d/1Xl8wB5YUMmldftpWIMy9WUm4AfeQoS28bP8ZlPU95lw/edit#heading=h.wmwy5v8eykqd

@codecov
Copy link

codecov bot commented Apr 20, 2023

Codecov Report

Merging #504 (d6217c8) into master (6fa8e48) will increase coverage by 0.00%.
The diff coverage is 91.66%.

@@           Coverage Diff           @@
##           master     #504   +/-   ##
=======================================
  Coverage   89.58%   89.59%           
=======================================
  Files          52       52           
  Lines        4475     4487   +12     
  Branches      676      744   +68     
=======================================
+ Hits         4009     4020   +11     
- Misses        466      467    +1     

@@ -163,6 +164,18 @@ private[querygeneration] class QueryBuilder(plan: LogicalPlan) {
expression.children.foreach(processExpression(_, statisticSet))
}

private def canUseSameConnection(snowflakeQueries: Seq[SnowflakeQuery]): Boolean = {
val sourceParameters: Seq[ConnectionCacheKey] = snowflakeQueries.flatMap(_.getSourceQueries)
.map(x => new ConnectionCacheKey(x.relation.params))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse the Connection sharable check criteria for Can-Join-or-Union.

@@ -89,6 +92,18 @@ private[querygeneration] abstract sealed class SnowflakeQuery {
}
}

private[querygeneration]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce UnarySnowflakeQuery and BinarySnowflakeQuery for code reuse.
Logically, we may introduce LeafSnowflakeQuery and MultipleChildrenSnowflakeQuery.
But it sounds not necessary because SourceQuery and Union are the only classes belong to them.
So just enhance SourceQuery and Union to implement it.

@@ -362,6 +379,9 @@ case class UnionQuery(children: Seq[LogicalPlan],
new QueryBuilder(child).treeRoot
}

override def getSourceQueries: Seq[SourceQuery] =
queries.flatMap(_.getSourceQueries)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: Spark Union can have multiple children. FYI.

@sfc-gh-mrui sfc-gh-mrui changed the title SNOW-770051 Fix a potential wrong result issue for crossing schema join/union SNOW-770051 Fix a wrong result issue for crossing schema join/union Apr 20, 2023
@sfc-gh-mrui sfc-gh-mrui marked this pull request as ready for review April 20, 2023 19:04
private def canUseSameConnection(snowflakeQueries: Seq[SnowflakeQuery]): Boolean = {
val sourceParameters: Seq[ConnectionCacheKey] = snowflakeQueries.flatMap(_.getSourceQueries)
.map(x => new ConnectionCacheKey(x.relation.params))
sourceParameters.forall(sourceParameters.head.equals(_))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toSet.size == 1

Copy link
Contributor

@sfc-gh-bli sfc-gh-bli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sfc-gh-mrui sfc-gh-mrui merged commit 78294d0 into master Apr 21, 2023
arthurli1126 pushed a commit to arthurli1126/spark-snowflake that referenced this pull request Jul 23, 2023
…nowflakedb#504)

* SNOW-770051 Fix a potential wrong result issue for crossing schema join/union

* Revise error message

* Simplify canUseSameConnection()

* Use toSet.size == 1 for duplication check.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants