Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51728][SQL] Add SELECT EXCEPT Support #50536

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Gschiavon
Copy link
Contributor

@Gschiavon Gschiavon commented Apr 8, 2025

What changes were proposed in this pull request?

Jira - https://issues.apache.org/jira/browse/SPARK-51728

This change introduces support for the SELECT * EXCEPT(col1, col2, ...) syntax, which allows users to project all columns from a dataset except for a specified subset. This pattern is especially useful in wide tables where explicitly listing all desired columns would be verbose and error-prone.

The syntax is already widely adopted in other SQL engines like BigQuery (docs), and adding support in Spark SQL helps improve compatibility, reduce boilerplate, and improve developer ergonomics when working with large schemas.

Why are the changes needed?

I think SELECT EXCEPT it's useful when working with large schemas and other DBMSes have it (e.g BigQuery) so I think it'd be good to add support in Spark as well

Does this PR introduce any user-facing change?

It doesn't include changes as it'd be a new feature

How was this patch tested?

It was tested in SQLQuerySuite by adding a dedicated test to SELECT EXCEPT

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Apr 8, 2025
@Gschiavon Gschiavon changed the title Feature/spark 51728 add except support [SPARK-51728][SQL] add except support Apr 8, 2025
@Gschiavon Gschiavon changed the title [SPARK-51728][SQL] add except support [SPARK-51728][SQL] Add SELECT EXCEPT Support Apr 8, 2025
@ik8
Copy link

ik8 commented Apr 11, 2025

This is already implemented here:

When spark.sql.parser.quotedRegexColumnNames is true, 
quoted identifiers (using backticks) in SELECT statement are interpreted as 
regular expressions and SELECT statement can take regex-based column specification.
For example, below SQL will only take column c:
SELECT `(a|b)?+.+` FROM (SELECT 1 as a, 2 as b, 3 as c)

@Gschiavon
Copy link
Contributor Author

This is already implemented here:

When spark.sql.parser.quotedRegexColumnNames is true, 
quoted identifiers (using backticks) in SELECT statement are interpreted as 
regular expressions and SELECT statement can take regex-based column specification.
For example, below SQL will only take column c:
SELECT `(a|b)?+.+` FROM (SELECT 1 as a, 2 as b, 3 as c)

@ik8 I don’t think this is the same. Imagine that you have a dataset with 200 columns and you want to select all but 20 columns, how would you do that? It would look really complex.

I think select except is different from select regex

@ik8
Copy link

ik8 commented Apr 12, 2025

This is already implemented here:

When spark.sql.parser.quotedRegexColumnNames is true, 
quoted identifiers (using backticks) in SELECT statement are interpreted as 
regular expressions and SELECT statement can take regex-based column specification.
For example, below SQL will only take column c:
SELECT `(a|b)?+.+` FROM (SELECT 1 as a, 2 as b, 3 as c)

@ik8 I don’t think this is the same. Imagine that you have a dataset with 200 columns and you want to select all but 20 columns, how would you do that? It would look really complex.

I think select except is different from select regex

@Gschiavon either way you will have to specify the columns you don't want to select.
I think this is a good option to have(your pr with the EXCEPT), just wanted to let you know that it is already supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants