[SPARK-54634][SQL] Add clear error message for empty IN predicate #53390

sahilkumarsingh · 2025-12-08T20:56:19Z

What changes were proposed in this pull request?

This PR will address the issue SPARK-54634.

With this, I am adding a user-friendly error message when users write SQL queries with an empty IN clause, like: SELECT * FROM table WHERE col IN ()

Why are the changes needed?

When users write SQL with an empty IN clause, Spark currently produces a syntax error of subclass [PARSE_SYNTAX_ERROR], which leads the user to believe that their syntax is incorrect, whereas the actual issue is due to the absence of values for the IN clause. Hence, the current error message does not communicate the right intention to the user.

This change provides a clear, actionable error message that explains the actual problem
and suggests alternatives.

Example - Before:

org.apache.spark.sql.catalyst.parser.ParseException:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'IN'. SQLSTATE: 42601 (line 1, pos 33)

Example - After:

org.apache.spark.sql.catalyst.parser.ParseException:
[INVALID_SQL_SYNTAX.EMPTY_IN_PREDICATE] Invalid SQL syntax: IN predicate requires at least one value. Empty IN clauses like 'IN ()' are not allowed. Consider using 'WHERE FALSE' if you need an always-false condition, or provide at least one value in the IN list. SQLSTATE: 42000

Does this PR introduce any user-facing change?

Yes, users will now see a better error message.

Code executed: spark.sql("SELECT * FROM range(10) WHERE id IN ()").show()

Before output:

After output:

How was this patch tested?

I have added unit tests in QueryParsingErrorsSuite.scala and SQL golden tests added in predicate-functions.sql
I have also tested the build locally by running the query in spark-shell

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (Anthropic) - used for code assistance, test generation, and documentation.

allisonwang-db

Thanks for making the error message better!

allisonwang-db · 2025-12-16T19:04:35Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala

      exception = parseException(sql2),
      condition = "PARSE_SYNTAX_ERROR",
-      parameters = Map("error" -> "'IN'", "hint" -> ""))
+      parameters = Map("error" -> "'INTO'", "hint" -> ""))


What's the error message before and after this change for this test case?

Hey Allison,

This is the before and after this change for this test case:

Before:

[scala> spark.sql("SELECT * FROM S WHERE C1 IN (INSERT INTO T VALUES (2))").show() org.apache.spark.sql.catalyst.parser.ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near 'IN'. SQLSTATE: 42601 (line 1, pos 25) == SQL == SELECT * FROM S WHERE C1 IN (INSERT INTO T VALUES (2)) -------------------------^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(parsers.scala:285) at org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:97) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:54) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(AbstractSqlParser.scala:93) at org.apache.spark.sql.classic.SparkSession.$anonfun$sql$5(SparkSession.scala:492) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:148) at org.apache.spark.sql.classic.SparkSession.$anonfun$sql$4(SparkSession.scala:491) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804) at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:490) at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:504) at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:513) at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:91) ... 42 elided

After:

[scala> spark.sql("SELECT * FROM S WHERE C1 IN (INSERT INTO T VALUES (2))").show() org.apache.spark.sql.catalyst.parser.ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near 'INTO'. SQLSTATE: 42601 (line 1, pos 36) == SQL == SELECT * FROM S WHERE C1 IN (INSERT INTO T VALUES (2)) ------------------------------------^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(parsers.scala:267) at org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:78) at org.apache.spark.sql.execution.SparkSqlParser.super$parse(SparkSqlParser.scala:163) at org.apache.spark.sql.execution.SparkSqlParser.$anonfun$parseInternal$1(SparkSqlParser.scala:163) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:107) at org.apache.spark.sql.execution.SparkSqlParser.parseInternal(SparkSqlParser.scala:163) at org.apache.spark.sql.execution.SparkSqlParser.parseWithParameters(SparkSqlParser.scala:70) at org.apache.spark.sql.execution.SparkSqlParser.parsePlanWithParameters(SparkSqlParser.scala:84) at org.apache.spark.sql.classic.SparkSession.$anonfun$sql$6(SparkSession.scala:573) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:148) at org.apache.spark.sql.classic.SparkSession.$anonfun$sql$4(SparkSession.scala:572) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804) at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:563) at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:591) at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:682) at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:92) ... 42 elided

[SPARK-54634][SQL] Add clear error message for empty IN predicate

17d4b37

github-actions bot added the SQL label Dec 8, 2025

sahilkumarsingh added 2 commits December 9, 2025 10:58

Updated the PlanParserSuite test

b25783f

Updated the PlanParserSuite test again and predicate-functions.sql.out

1caa5c6

allisonwang-db reviewed Dec 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54634][SQL] Add clear error message for empty IN predicate #53390

[SPARK-54634][SQL] Add clear error message for empty IN predicate #53390

sahilkumarsingh commented Dec 8, 2025

Uh oh!

allisonwang-db left a comment

Uh oh!

allisonwang-db Dec 16, 2025

Uh oh!

sahilkumarsingh Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-54634][SQL] Add clear error message for empty IN predicate #53390

Are you sure you want to change the base?

[SPARK-54634][SQL] Add clear error message for empty IN predicate #53390

Conversation

sahilkumarsingh commented Dec 8, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

allisonwang-db left a comment

Choose a reason for hiding this comment

Uh oh!

allisonwang-db Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

sahilkumarsingh Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants