-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-54634][SQL] Add clear error message for empty IN predicate #53390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[SPARK-54634][SQL] Add clear error message for empty IN predicate #53390
Conversation
allisonwang-db
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the error message better!
| exception = parseException(sql2), | ||
| condition = "PARSE_SYNTAX_ERROR", | ||
| parameters = Map("error" -> "'IN'", "hint" -> "")) | ||
| parameters = Map("error" -> "'INTO'", "hint" -> "")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the error message before and after this change for this test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Allison,
This is the before and after this change for this test case:
Before:
[scala> spark.sql("SELECT * FROM S WHERE C1 IN (INSERT INTO T VALUES (2))").show()
org.apache.spark.sql.catalyst.parser.ParseException:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'IN'. SQLSTATE: 42601 (line 1, pos 25)
== SQL ==
SELECT * FROM S WHERE C1 IN (INSERT INTO T VALUES (2))
-------------------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(parsers.scala:285)
at org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:97)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:54)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(AbstractSqlParser.scala:93)
at org.apache.spark.sql.classic.SparkSession.$anonfun$sql$5(SparkSession.scala:492)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:148)
at org.apache.spark.sql.classic.SparkSession.$anonfun$sql$4(SparkSession.scala:491)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:490)
at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:504)
at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:513)
at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:91)
... 42 elided
After:
[scala> spark.sql("SELECT * FROM S WHERE C1 IN (INSERT INTO T VALUES (2))").show()
org.apache.spark.sql.catalyst.parser.ParseException:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'INTO'. SQLSTATE: 42601 (line 1, pos 36)
== SQL ==
SELECT * FROM S WHERE C1 IN (INSERT INTO T VALUES (2))
------------------------------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(parsers.scala:267)
at org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:78)
at org.apache.spark.sql.execution.SparkSqlParser.super$parse(SparkSqlParser.scala:163)
at org.apache.spark.sql.execution.SparkSqlParser.$anonfun$parseInternal$1(SparkSqlParser.scala:163)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:107)
at org.apache.spark.sql.execution.SparkSqlParser.parseInternal(SparkSqlParser.scala:163)
at org.apache.spark.sql.execution.SparkSqlParser.parseWithParameters(SparkSqlParser.scala:70)
at org.apache.spark.sql.execution.SparkSqlParser.parsePlanWithParameters(SparkSqlParser.scala:84)
at org.apache.spark.sql.classic.SparkSession.$anonfun$sql$6(SparkSession.scala:573)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:148)
at org.apache.spark.sql.classic.SparkSession.$anonfun$sql$4(SparkSession.scala:572)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:563)
at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:591)
at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:682)
at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:92)
... 42 elided
What changes were proposed in this pull request?
This PR will address the issue SPARK-54634.
With this, I am adding a user-friendly error message when users write SQL queries with an empty IN clause, like: SELECT * FROM table WHERE col IN ()
Why are the changes needed?
When users write SQL with an empty IN clause, Spark currently produces a syntax error of subclass [PARSE_SYNTAX_ERROR], which leads the user to believe that their syntax is incorrect, whereas the actual issue is due to the absence of values for the IN clause. Hence, the current error message does not communicate the right intention to the user.
This change provides a clear, actionable error message that explains the actual problem
and suggests alternatives.
Example - Before:
Example - After:
Does this PR introduce any user-facing change?
Yes, users will now see a better error message.
Code executed:
spark.sql("SELECT * FROM range(10) WHERE id IN ()").show()Before output:

After output:

How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude (Anthropic) - used for code assistance, test generation, and documentation.