-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-14839][SQL] Support for other types for tableProperty rule in SQL syntax
#13517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Please let me cc @davies. Thanks! |
|
Test build #60004 has finished for PR 13517 at commit
|
python/pyspark/sql/readwriter.py
Outdated
| :param path: string represents path to the JSON dataset, | ||
| or RDD of Strings storing JSON objects. | ||
| :param schema: an optional :class:`StructType` for the input schema. | ||
| :param samplingRatio: sets the ratio for sampling and reading the input data to infer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was actually intentional that samplingRatio was undocumented, because regardless the value, Spark still needs to read all the data so this might as well be 1 all the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. It does not affect the actual I/O but just drops some and then try to infer the schema.
I will remove the change.
BTW, actually, I have found another one mergeSchema option in Parquet data source, which I guess should be located in ParquetOptions (and this is undocumented as well). Can this be done here together maybe..?
|
Test build #60032 has finished for PR 13517 at commit
|
| val props = properties.toMap | ||
| val badKeys = props.filter { case (_, v) => v == null }.keys | ||
| if (badKeys.nonEmpty) { | ||
| throw operationNotAllowed( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message does not match with the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I just fixed.
|
Test build #60090 has finished for PR 13517 at commit
|
|
retest this please |
|
Test build #60092 has finished for PR 13517 at commit
|
|
cc @hvanhovell for this one |
| (PARTITIONED BY partitionColumnNames=identifierList)? | ||
| bucketSpec? #createTableUsing | ||
| | createTableHeader tableProvider | ||
| (OPTIONS tablePropertyList)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not generalize the tableProperty rule and use optionValue (rename it to something more consistent) as its value rule? Seems easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope I understood this correctly. I guess you meant the tableProperty rule, for example, as below:
tableProperty
: key=tablePropertyKey (EQ? value=optionValue)?
;If so, I am worried if this affects other rules such as DBPROPERTIES and TBLPROPERTIES (allowing other types as values). I made this separate because it seems allowing other types in OPTIONS clause complies standard SQL.
If not, could you give me an advice in a bit more detail please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, do you mean it is okay to support other types for other rules using tableProperty as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Sorry for pinging @hvanhovell)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon I am on holiday - so I am bit slow with my responses.
Yo have understood me correctly. What I am suggesting will affect the DBPROPERTIES and TBLPROPERTIES; it will also allow for boolean and numeric options. I don't think this is a bad thing, it is better to have a lenient parser and to constrain behavior in the AstBuilder (this allows us to throw much better error messages).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hvanhovell I see. Thanks, I didn't expect you are on holidays..
I will push some commits and wait. Please feel free to review when you have some time!
|
Test build #61429 has finished for PR 13517 at commit
|
tableProperty rule in SQL syntax
|
Test build #61672 has finished for PR 13517 at commit
|
| } | ||
| } | ||
|
|
||
| test("SPARK-14839: Support for other types as option in OPTIONS clause") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this test to DDLCommandSuite? Could you also add a test for TBLPROPERTIES/DBPROPERTIES?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
|
Looks pretty good. Left one test related comment. |
|
Test build #61698 has finished for PR 13517 at commit
|
|
Test build #61699 has finished for PR 13517 at commit
|
|
(@hvanhovell I just addressed your comments!) |
|
@HyukjinKwon still on holiday... LGTM - merging to master. Thanks! |
|
Oh....... sorry... and thanks.. |
|
NP :) |
What changes were proposed in this pull request?
Currently, Scala API supports to take options with the types,
String,Long,DoubleandBooleanand Python API also supports other types.This PR corrects
tablePropertyrule to support other types (string, boolean, double and integer) so that support the options for data sources in a consistent way. This will affect other rules such as DBPROPERTIES and TBLPROPERTIES (allowing other types as values).Also,
TODO add bucketing and partitioning.was removed because it was resolved in 24bea00How was this patch tested?
Unit test in
MetastoreDataSourcesSuite.scala.