[SPARK-51728][SQL] Add SELECT EXCEPT Support #50536

Gschiavon · 2025-04-08T07:32:48Z

What changes were proposed in this pull request?

Jira - https://issues.apache.org/jira/browse/SPARK-51728

This change introduces support for the SELECT * EXCEPT(col1, col2, ...) syntax, which allows users to project all columns from a dataset except for a specified subset. This pattern is especially useful in wide tables where explicitly listing all desired columns would be verbose and error-prone.

The syntax is already widely adopted in other SQL engines like BigQuery (docs), and adding support in Spark SQL helps improve compatibility, reduce boilerplate, and improve developer ergonomics when working with large schemas.

Why are the changes needed?

I think SELECT EXCEPT it's useful when working with large schemas and other DBMSes have it (e.g BigQuery) so I think it'd be good to add support in Spark as well

Does this PR introduce any user-facing change?

It doesn't include changes as it'd be a new feature

How was this patch tested?

It was tested in SQLQuerySuite by adding a dedicated test to SELECT EXCEPT

Was this patch authored or co-authored using generative AI tooling?

No

ik8 · 2025-04-11T16:01:06Z

This is already implemented here:

When spark.sql.parser.quotedRegexColumnNames is true, 
quoted identifiers (using backticks) in SELECT statement are interpreted as 
regular expressions and SELECT statement can take regex-based column specification.
For example, below SQL will only take column c:
SELECT `(a|b)?+.+` FROM (SELECT 1 as a, 2 as b, 3 as c)

Gschiavon · 2025-04-12T04:26:16Z

This is already implemented here:

When spark.sql.parser.quotedRegexColumnNames is true, 
quoted identifiers (using backticks) in SELECT statement are interpreted as 
regular expressions and SELECT statement can take regex-based column specification.
For example, below SQL will only take column c:
SELECT `(a|b)?+.+` FROM (SELECT 1 as a, 2 as b, 3 as c)

@ik8 I don’t think this is the same. Imagine that you have a dataset with 200 columns and you want to select all but 20 columns, how would you do that? It would look really complex.

I think select except is different from select regex

ik8 · 2025-04-12T06:33:01Z

This is already implemented here:
When spark.sql.parser.quotedRegexColumnNames is true, 
quoted identifiers (using backticks) in SELECT statement are interpreted as 
regular expressions and SELECT statement can take regex-based column specification.
For example, below SQL will only take column c:
SELECT `(a|b)?+.+` FROM (SELECT 1 as a, 2 as b, 3 as c)
@ik8 I don’t think this is the same. Imagine that you have a dataset with 200 columns and you want to select all but 20 columns, how would you do that? It would look really complex.

I think select except is different from select regex

@Gschiavon either way you will have to specify the columns you don't want to select.
I think this is a good option to have(your pr with the EXCEPT), just wanted to let you know that it is already supported.

Gschiavon added 2 commits April 7, 2025 12:44

Update Analyzer.scala

2f8374f

Update SQLQuerySuite.scala

68b135a

github-actions bot added the SQL label Apr 8, 2025

Gschiavon changed the title ~~Feature/spark 51728 add except support~~ [SPARK-51728][SQL] add except support Apr 8, 2025

Gschiavon changed the title ~~[SPARK-51728][SQL] add except support~~ [SPARK-51728][SQL] Add SELECT EXCEPT Support Apr 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-51728][SQL] Add SELECT EXCEPT Support #50536

[SPARK-51728][SQL] Add SELECT EXCEPT Support #50536

Gschiavon commented Apr 8, 2025 •

edited

Loading

ik8 commented Apr 11, 2025 •

edited

Loading

Gschiavon commented Apr 12, 2025

ik8 commented Apr 12, 2025

[SPARK-51728][SQL] Add SELECT EXCEPT Support #50536

Are you sure you want to change the base?

[SPARK-51728][SQL] Add SELECT EXCEPT Support #50536

Conversation

Gschiavon commented Apr 8, 2025 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

ik8 commented Apr 11, 2025 • edited Loading

Gschiavon commented Apr 12, 2025

ik8 commented Apr 12, 2025

Gschiavon commented Apr 8, 2025 •

edited

Loading

ik8 commented Apr 11, 2025 •

edited

Loading