Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jul 18, 2018

What changes were proposed in this pull request?

In the PR, I propose to extend the StructType/StructField classes by new method toDDL which converts a value of the StructType/StructField type to a string formatted in DDL style. The resulted string can be used in a table creation.

The toDDL method of StructField is reused in SHOW CREATE TABLE. In this way the PR fixes the bug of unquoted names of nested fields.

How was this patch tested?

I add a test for checking the new method and 2 round trip tests: fromDDL -> toDDL and toDDL -> fromDDL

@maropu
Copy link
Member

maropu commented Jul 18, 2018

How about the case where a column name has special characters that should be backquoted, e.g., 'aaa:bbb'?

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 18, 2018

@maropu I added quoting of column names

@SparkQA
Copy link

SparkQA commented Jul 18, 2018

Test build #93235 has finished for PR 21803 at commit 34511db.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 18, 2018

Test build #93242 has finished for PR 21803 at commit f302777.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jul 19, 2018

(As I described in the jira) What's this func is used for? Is this related to the other work?

* `StructType(Seq(StructField("a", IntegerType)))` should be converted to `a int`
*/
def toDDL(struct: StructType): String = {
struct.map(field => s"${quoteIdentifier(field.name)} ${field.dataType.sql}").mkString(",")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this also handle the special character ('\n', '\t', '', ...) that needs an escape?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use similar code in SHOW CREATE TABLE:

s"${quoteIdentifier(column.name)} ${column.dataType.catalogString}${comment.getOrElse("")}"

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 19, 2018

@hvanhovell Could you look at the PR please.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 19, 2018

(As I described in the jira) What's this func is used for?

@maropu I answered in JIRA, please, look at it.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxGekk, is the purpose of this API is to have a int instead of struct<a: int>?

You can use struct.map(field => s"${quoteIdentifier(field.name)} ${field.dataType.sql}").mkString(",") in the application code or you could simply:

scala> DataType.fromDDL(StructType.fromDDL("a int").catalogString)
res4: org.apache.spark.sql.types.DataType = StructType(StructField(a,IntegerType,true))

@HyukjinKwon
Copy link
Member

We should reduce APIs exposed .. there are already a hell of a lot..

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 19, 2018

is the purpose of this API is to have a int instead of struct<a: int>

Basically, yes. All those methods simpleString(), catalogString(), sql() return struct< ... : ...> which is impossible to uses in a table creation because of : and struct

We should reduce APIs exposed .. there are already a hell of a lot..

I believe we should reduce user's/customer's pain first of all. I think the function which is opposite to fromDLL looks pretty natural here.

@HyukjinKwon
Copy link
Member

Does the alternative code I posted above work in that case? If so, sounds we are adding an API just for consistency

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 19, 2018

Does the alternative code I posted above work in that case?
You can use struct.map(field => s"${quoteIdentifier(field.name)} ${field.dataType.sql}").mkString(",") in the application code

It is actually the same as in the PR. For sure, it works ( I added tests for that ;-) )

or you could simply:
scala> DataType.fromDDL(StructType.fromDDL("a int").catalogString)

Sorry, I didn't catch the example. Just in case, I need to read one avro file from a folder, take a schema from it in DDL format and create table by using the string. This is the use case.

If so, sounds we are adding an API just for consistency ...

Not only, we are adding the method to cover the use case which I described above. And for consistency too. Having useful API, isn't it beautiful?

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jul 19, 2018

Ah, I misunderstood then. Thing is, fromDDL was added because it's needed other APIs rather then it's own purpose. I wonder how commonly it will be used to be honest. I haven't seen such requests before frankly.

In my case, I usually leave mailing list or stackoverflow questions as references usually though. It's one liner code and I think we haven't added APIs when the workarounds are easy usually so far. That doesn't quite match to what I am used to personally at least.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 19, 2018

Thing is, fromDDL was added because it's needed other APIs rather then it's own purpose.

The toDDL function can be reused from SHOW CREATE TABLE too:

s"${quoteIdentifier(column.name)} ${column.dataType.catalogString}${comment.getOrElse("")}"

I wonder how commonly it will be used to be honest. I haven't seen such requests before frankly.

Frankly speaking I cannot say there is high demand for the method from our users.

It's one liner code and I think we haven't added APIs when the workarounds are easy usually so far.

Do you think I should close the PR?

@HyukjinKwon
Copy link
Member

To me yup, but if you are in doubt, I am perfectly okay with waiting some more days and see if other opinions arrive.

@rxin
Copy link
Contributor

rxin commented Jul 20, 2018

Should we do schema.toDDL, or StructType.toDDL(schema)?

@gatorsmile
Copy link
Member

schema.toDDL is more friendly.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 20, 2018

Should we do schema.toDDL, or StructType.toDDL(schema)?
schema.toDDL is more friendly.

For sure, schema.toDDL looks more natural. I put toDDL to the StructType object only to have it closer to fromDDL and to do not pollute the namespace of the StructType class which already has inherited from DataType similar methods simpleString(), catalogString() and sql()

@SparkQA
Copy link

SparkQA commented Jul 21, 2018

Test build #93387 has finished for PR 21803 at commit d542bf3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 24, 2018

@gatorsmile @rxin I moved toDDL to the StructType class and reused it from SHOW CREATE TABLE (fixed the bug of unquoted field names). Please, have take a look at the PR.

@gatorsmile
Copy link
Member

gatorsmile commented Jul 24, 2018

@MaxGekk Thanks for fixing the issue in SHOW CREATE TABLE! Could you open a JIRA about SHOW CREATE TABLE and add the JIRA number to this PR. Also, could you add the failed case in ShowCreateTableSuite?

@MaxGekk MaxGekk changed the title [SPARK-24849][SQL] Converting a value of StructType to a DDL string [SPARK-24849][SPARK-24911][SQL] Converting a value of StructType to a DDL string Jul 24, 2018
@SparkQA
Copy link

SparkQA commented Jul 25, 2018

Test build #93522 has finished for PR 21803 at commit 738e97c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jul 25, 2018

I am okay now just for clarification if you guys feel strong on this ~

@gatorsmile
Copy link
Member

It is nice to have. Actually, I believe we need to fix the bug in SHOW CREATE TABLE, which is widely used.

@HyukjinKwon
Copy link
Member

yup the current change sounds okay.

@gatorsmile
Copy link
Member

@MaxGekk Please include the test case for SHOW CREATE TABLE. Thanks!

@SparkQA
Copy link

SparkQA commented Jul 25, 2018

Test build #93540 has finished for PR 21803 at commit 60f663d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM

Thanks! Merged to master

@asfgit asfgit closed this in 2f77616 Jul 25, 2018
@MaxGekk MaxGekk deleted the to-ddl branch August 17, 2019 13:35
otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
… DDL string

In the PR, I propose to extend the `StructType`/`StructField` classes by new method `toDDL` which converts a value of the `StructType`/`StructField` type to a string formatted in DDL style. The resulted string can be used in a table creation.

The `toDDL` method of `StructField` is reused in `SHOW CREATE TABLE`. In this way the PR fixes the bug of unquoted names of nested fields.

I add a test for checking the new method and 2 round trip tests: `fromDDL` -> `toDDL` and `toDDL` -> `fromDDL`

Author: Maxim Gekk <maxim.gekk@databricks.com>

Closes apache#21803 from MaxGekk/to-ddl.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants