Skip to content

Conversation

@milastdbx
Copy link
Contributor

@milastdbx milastdbx commented Nov 16, 2023

What changes were proposed in this pull request?

Changing parser to support new syntax when using * to fetch columns from source.
Introducing new expression UnresolvedStarExcept in visit method when newly introduced syntax is created.
Expansion of this expression is core logic of feature.

Why are the changes needed?

Introducing new SELECT * EXCEPT (col1, col2)

Does this PR introduce any user-facing change?

Yes, this PR introduces new SQL syntax, which is used to explicitly exclude columns from star projection.

How was this patch tested?

Unit tests.
Generated new golden files.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Nov 16, 2023
@milastdbx milastdbx force-pushed the feature/selectStarExcept branch from 2ea05b3 to 18dd5b2 Compare November 16, 2023 16:25
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR, @milastdbx . Could you provide some references for this syntax in other SQL environments?

@milastdbx
Copy link
Contributor Author

Thank you for making a PR, @milastdbx . Could you provide some references for this syntax in other SQL environments?

When you say other SQL environments, what exactly are you referring to ?

@dongjoon-hyun
Copy link
Member

Popular ones like Apache Hive, Apache Flink, Presto, MySQL, PostgreSQL, MySQL, Oracle, Teradata?

@milastdbx
Copy link
Contributor Author

Popular ones like Apache Hive, Apache Flink, Presto, MySQL, PostgreSQL, MySQL, Oracle, Teradata?

I don't think any other platform supports it. I maybe rushed the comment saying its ansii standard. I'll update PR description

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache Spark community want to be as compatible as possible other SQL environments in order to avoid potential lock-ins effects. Given that context, this new syntax looks too esoteric to be accepted from my perspective.

Some other Apache Spark committers may have different opinions.

@HyukjinKwon
Copy link
Member

Yeah, i wouldn't add this as a dialect but better stick to other DBMSes or ANSI standard

@cloud-fan
Copy link
Contributor

I think this is a useful feature, see https://stackoverflow.com/questions/29095281/select-all-the-columns-of-a-table-except-one-column and https://dba.stackexchange.com/questions/1957/sql-select-all-columns-except-some

In fact, Spark already has a SELECT regex feature: #18023 . We can't ignore the need to flexibly select some but not all columns.

If we have to follow an example, Databricks SQL supports this syntax: https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-qry-select.html

@HyukjinKwon
Copy link
Member

im okay if we already have the variant.

@surjikal
Copy link
Contributor

surjikal commented Nov 21, 2023

Bigquery supports SELECT * EXCEPT:

On the other hand, Snowflake and DuckDB use SELECT * EXCLUDE:


For what it's worth, I do things like df.drop('foo')['*'] very often to achieve a similar thing in dataframe land.

My app's DB (not spark) doesn't support the EXCEPT feature and I wish it did, there's a fair amount of extra SQL generated as a workaround. Definitely a good feature from my perspective.

@cloud-fan
Copy link
Contributor

there is a test failure: SparkThrowableSuite.Error classes are correctly formatted

@github-actions github-actions bot added the DOCS label Nov 23, 2023
@cloud-fan
Copy link
Contributor

The error in docker-integration-tests is unrelated, thanks, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants