Skip to content

Conversation

@Peng-Lei
Copy link
Contributor

@Peng-Lei Peng-Lei commented Jan 17, 2022

What changes were proposed in this pull request?

Quote the column name if needed instead of always.

Why are the changes needed?

#comments

Does this PR introduce any user-facing change?

Yes,It will change the result that users get the schema
eg:

"STRUCT<`_c0`: STRING, `_c1`: INT>"  => "STRUCT<_c0: STRING, _c1: INT>" 

At now. for end-user. I learn about the 3 way to get schema directly

  1. the function: eg
schema_of_json
schema_of_csv
  1. table schema
    df.schema or show create table
  2. call toDDL for StructType or StructField.

How was this patch tested?

existed testcase.

@github-actions github-actions bot added the SQL label Jan 17, 2022
@Peng-Lei Peng-Lei force-pushed the Quote-Column branch 2 times, most recently from 4ad25a8 to ec172e4 Compare January 18, 2022 03:09
@Peng-Lei
Copy link
Contributor Author

@cloud-fan Could you take a look ? Thank you very much.

@cloud-fan
Copy link
Contributor

cloud-fan commented Jan 18, 2022

Yes, when end-user get the table metadata, if the columns does not need quote. it will be a normal string.

Let's be specific here. This only affects schema DDL string AFAIK, while "get the table metadata" is low-level operation and it's scarying to change it.

@cloud-fan
Copy link
Contributor

looks fine and I can't think of any breaking cases. cc @HyukjinKwon @viirya as well

@Peng-Lei
Copy link
Contributor Author

Let's be specific here. This only affects schema DDL string AFAIK, while "get the table metadata" is low-level operation and it's scarying to change it.

I found that the toDDL first introduce by #21803. and also for show create table at the beginning.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 15464e3 Jan 20, 2022
@HyukjinKwon
Copy link
Member

Yeah, I think this is fine.

@cloud-fan cloud-fan changed the title [SPARK-37931][SQL] Quote the column name if neededQuote the column name if needed [SPARK-37931][SQL] Quote the column name if needed Jan 20, 2022
`c` CHAR(5),
`v` VARCHAR(6))
c CHAR(5),
v VARCHAR(6))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the view test in this file still has quotes. ShowCreateTableCommand has a branch for view, and we need to update the quoting logic in that branch as well. @Peng-Lei

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I left this out. I will do it by a follow up.

cloud-fan pushed a commit that referenced this pull request Jan 28, 2022
### What changes were proposed in this pull request?
Quote the column name of view when `SHOW CREATE TABLE` for view.

### Why are the changes needed?
follow up the [#PR](#35227). Keep the consistent between table and view when `SHOW CREATE TABLE`.

### Does this PR introduce _any_ user-facing change?
Yes,It will change the result of `SHOW CREATE TABLE` for view.
eg:
```
"STRUCT<`_c0`, `_c1`>"  => "STRUCT<_c0, _c1>"
```

### How was this patch tested?
existed testcase.

Closes #35351 from Peng-Lei/view-quote-columns.

Authored-by: PengLei <peng.8lei@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
senthh pushed a commit to senthh/spark-1 that referenced this pull request Feb 3, 2022
### What changes were proposed in this pull request?
Quote the column name of view when `SHOW CREATE TABLE` for view.

### Why are the changes needed?
follow up the [#PR](apache#35227). Keep the consistent between table and view when `SHOW CREATE TABLE`.

### Does this PR introduce _any_ user-facing change?
Yes,It will change the result of `SHOW CREATE TABLE` for view.
eg:
```
"STRUCT<`_c0`, `_c1`>"  => "STRUCT<_c0, _c1>"
```

### How was this patch tested?
existed testcase.

Closes apache#35351 from Peng-Lei/view-quote-columns.

Authored-by: PengLei <peng.8lei@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@gatorsmile
Copy link
Member

This is a user-facing breaking change. The third party libraries (esp. SQL clients) might rely on the current format. We should avoid making such a change.

@HyukjinKwon
Copy link
Member

Hm, I think it's less likely that this breaks something because it removes the backquotes when they are unnecessary, meaning that it's still able to read and parse it back with lower versions of Spark.

@cloud-fan
Copy link
Contributor

I think we should clearly document the display-like commands. Their output is not structured and we only expect humans to read it, not programs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants