-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24849][SPARK-24911][SQL] Converting a value of StructType to a DDL string #21803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
How about the case where a column name has special characters that should be backquoted, e.g., 'aaa:bbb'? |
|
@maropu I added quoting of column names |
|
Test build #93235 has finished for PR 21803 at commit
|
|
Test build #93242 has finished for PR 21803 at commit
|
|
(As I described in the jira) What's this func is used for? Is this related to the other work? |
| * `StructType(Seq(StructField("a", IntegerType)))` should be converted to `a int` | ||
| */ | ||
| def toDDL(struct: StructType): String = { | ||
| struct.map(field => s"${quoteIdentifier(field.name)} ${field.dataType.sql}").mkString(",") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this also handle the special character ('\n', '\t', '', ...) that needs an escape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use similar code in SHOW CREATE TABLE:
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
Line 999 in bac50aa
| s"${quoteIdentifier(column.name)} ${column.dataType.catalogString}${comment.getOrElse("")}" |
|
@hvanhovell Could you look at the PR please. |
@maropu I answered in JIRA, please, look at it. |
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaxGekk, is the purpose of this API is to have a int instead of struct<a: int>?
You can use struct.map(field => s"${quoteIdentifier(field.name)} ${field.dataType.sql}").mkString(",") in the application code or you could simply:
scala> DataType.fromDDL(StructType.fromDDL("a int").catalogString)
res4: org.apache.spark.sql.types.DataType = StructType(StructField(a,IntegerType,true))|
We should reduce APIs exposed .. there are already a hell of a lot.. |
Basically, yes. All those methods
I believe we should reduce user's/customer's pain first of all. I think the function which is opposite to |
|
Does the alternative code I posted above work in that case? If so, sounds we are adding an API just for consistency |
It is actually the same as in the PR. For sure, it works ( I added tests for that ;-) )
Sorry, I didn't catch the example. Just in case, I need to read one avro file from a folder, take a schema from it in DDL format and create table by using the string. This is the use case.
Not only, we are adding the method to cover the use case which I described above. And for consistency too. Having useful API, isn't it beautiful? |
|
Ah, I misunderstood then. Thing is, In my case, I usually leave mailing list or stackoverflow questions as references usually though. It's one liner code and I think we haven't added APIs when the workarounds are easy usually so far. That doesn't quite match to what I am used to personally at least. |
The spark/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala Line 999 in bac50aa
Frankly speaking I cannot say there is high demand for the method from our users.
Do you think I should close the PR? |
|
To me yup, but if you are in doubt, I am perfectly okay with waiting some more days and see if other opinions arrive. |
|
Should we do schema.toDDL, or StructType.toDDL(schema)? |
|
schema.toDDL is more friendly. |
For sure, |
|
Test build #93387 has finished for PR 21803 at commit
|
|
@gatorsmile @rxin I moved |
|
@MaxGekk Thanks for fixing the issue in |
|
Test build #93522 has finished for PR 21803 at commit
|
|
I am okay now just for clarification if you guys feel strong on this ~ |
|
It is nice to have. Actually, I believe we need to fix the bug in |
|
yup the current change sounds okay. |
|
@MaxGekk Please include the test case for SHOW CREATE TABLE. Thanks! |
|
Test build #93540 has finished for PR 21803 at commit
|
|
LGTM Thanks! Merged to master |
… DDL string In the PR, I propose to extend the `StructType`/`StructField` classes by new method `toDDL` which converts a value of the `StructType`/`StructField` type to a string formatted in DDL style. The resulted string can be used in a table creation. The `toDDL` method of `StructField` is reused in `SHOW CREATE TABLE`. In this way the PR fixes the bug of unquoted names of nested fields. I add a test for checking the new method and 2 round trip tests: `fromDDL` -> `toDDL` and `toDDL` -> `fromDDL` Author: Maxim Gekk <maxim.gekk@databricks.com> Closes apache#21803 from MaxGekk/to-ddl.
What changes were proposed in this pull request?
In the PR, I propose to extend the
StructType/StructFieldclasses by new methodtoDDLwhich converts a value of theStructType/StructFieldtype to a string formatted in DDL style. The resulted string can be used in a table creation.The
toDDLmethod ofStructFieldis reused inSHOW CREATE TABLE. In this way the PR fixes the bug of unquoted names of nested fields.How was this patch tested?
I add a test for checking the new method and 2 round trip tests:
fromDDL->toDDLandtoDDL->fromDDL