-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-42128][SQL] Support TOP (N) for MS SQL Server dialect as an alternative to Limit pushdown #39660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-42128][SQL] Support TOP (N) for MS SQL Server dialect as an alternative to Limit pushdown #39660
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -307,11 +307,12 @@ private[jdbc] class JDBCRDD( | |
| "" | ||
| } | ||
|
|
||
| val myTopExpression: String = dialect.getTopExpression(limit) // SQL Server Limit alternative | ||
| val myLimitClause: String = dialect.getLimitClause(limit) | ||
| val myOffsetClause: String = dialect.getOffsetClause(offset) | ||
|
|
||
| val sqlText = options.prepareQuery + | ||
| s"SELECT $columnList FROM ${options.tableOrQuery} $myTableSampleClause" + | ||
| s"SELECT $myTopExpression $columnList FROM ${options.tableOrQuery} $myTableSampleClause" + | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's hard to define a SQL template that works for all dialects. The approach we take is to try to make it a superset of all dialects. A better design is to let the dialect build the SQL. For example, we can add a new API in Then Spark gets the builder and do SQLServerDialect can put the limit before the column list.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will do the work. |
||
| s" $myWhereClause $getGroupByClause $getOrderByClause $myLimitClause $myOffsetClause" | ||
| stmt = conn.prepareStatement(sqlText, | ||
| ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -544,6 +544,14 @@ abstract class JdbcDialect extends Serializable with Logging { | |
| if (limit > 0 ) s"LIMIT $limit" else "" | ||
| } | ||
|
|
||
| /** | ||
| * MS SQL Server version of `getLimitClause`. | ||
| * This is only supported by SQL Server as it uses TOP (N) instead. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem is TOP (N) is just the SQLServer syntax of LIMIT. It's really weird to have a new dialect API for it, as it's different from table sample which is a new feature. |
||
| */ | ||
| def getTopExpression(limit: Integer): String = { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This API looks a little hack. #39667 refactor the API and this PR could override getSQLText.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What exactly is hacky in this API? As far as I can see, it just follows the existing JDBC code. If you are referring to an alternative implementation, then I am happy to consider it, otherwise please elaborate.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also don't see how the PR you linked solves this problem. Yes, there is no top reference in the general JdbcDialects but now MSSQL dialect needs to reimplement
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should define the different syntax by dialect themself. I exact the API
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, it would be good to have something more flexible than the current API; however, I also don't think that moving I think it would be good to split the query into fragments and have methods for each. However, there has already been a precedent with
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can work on a refined API for JdbcDialects as a follow-up for this work, would that work? I may need to prepare a small SPIP/design doc on it. @beliefer @srowen @dongjoon-hyun What do you think? Or should I just work on a new API right away without this change? I would appreciate your suggestions.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@sadikovi You said is right. I simplified the #39667. |
||
| "" | ||
| } | ||
|
|
||
| /** | ||
| * returns the OFFSET clause for the SELECT statement | ||
| */ | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -167,10 +167,15 @@ private object MsSqlServerDialect extends JdbcDialect { | |
| throw QueryExecutionErrors.commentOnTableUnsupportedError() | ||
| } | ||
|
|
||
| // SQL Server does not support, it uses `getTopExpression` instead. | ||
| override def getLimitClause(limit: Integer): String = { | ||
| "" | ||
| } | ||
|
|
||
| override def getTopExpression(limit: Integer): String = { | ||
| if (limit > 0) s"TOP ($limit)" else "" | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just curious, MsSQL syntax doesn't allow
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TOP 0 is supported but 0 is used as a sentinel value to disable the pushdown, it is similar to how LIMIT works currently. I tested |
||
| } | ||
|
|
||
| override def classifyException(message: String, e: Throwable): AnalysisException = { | ||
| e match { | ||
| case sqlException: SQLException => | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, feels like this belongs all in the MySQL dialect, one way or the other. Is it not just an alternative LIMIT clause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in the PR description, SQL Server does not support LIMIT clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, what i mean is, is "TOP n" just the 'limit' clause you need to generate for MySQL? then it doesn't need different handling, just needs to generate a different clause. But looks like it goes into the SELECT part, not at the end, OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MySQL supports LIMIT clause, I assume you meant MSSQL/SQL Server.
Yes, you are right. Although TOP N has similar semantics as LIMIT, it is more of an expression (?) rather than a clause, sort of like DISTINCT. We need to insert TOP N before the list of columns in select, unfortunately, this is the syntax for it.
I came up with this approach of introducing another
getTopExpressionbecause I thought it would be the least intrusive one but I am open to alternative approaches, please let me know if you have any suggestions as to how I can refactor the code to support this.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops yes I'm talking about what you're talking about, typo - MS SQL Server
I'm OK with it; the alternative is to somehow edit the SQL query down in the MS SQL dialect maybe, I haven't thought about it. It'd be nicer but not sure it's cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there is already a precedent for adding something like this:
getTableSamplemethod.