Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jun 5, 2016

What changes were proposed in this pull request?

Currently, Scala API supports to take options with the types, String, Long, Double and Boolean and Python API also supports other types.

This PR corrects tableProperty rule to support other types (string, boolean, double and integer) so that support the options for data sources in a consistent way. This will affect other rules such as DBPROPERTIES and TBLPROPERTIES (allowing other types as values).

Also, TODO add bucketing and partitioning. was removed because it was resolved in 24bea00

How was this patch tested?

Unit test in MetastoreDataSourcesSuite.scala.

@HyukjinKwon
Copy link
Member Author

Please let me cc @davies. Thanks!

@SparkQA
Copy link

SparkQA commented Jun 5, 2016

Test build #60004 has finished for PR 13517 at commit 905bdc5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

:param path: string represents path to the JSON dataset,
or RDD of Strings storing JSON objects.
:param schema: an optional :class:`StructType` for the input schema.
:param samplingRatio: sets the ratio for sampling and reading the input data to infer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was actually intentional that samplingRatio was undocumented, because regardless the value, Spark still needs to read all the data so this might as well be 1 all the time.

Copy link
Member Author

@HyukjinKwon HyukjinKwon Jun 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. It does not affect the actual I/O but just drops some and then try to infer the schema.
I will remove the change.

BTW, actually, I have found another one mergeSchema option in Parquet data source, which I guess should be located in ParquetOptions (and this is undocumented as well). Can this be done here together maybe..?

@SparkQA
Copy link

SparkQA commented Jun 6, 2016

Test build #60032 has finished for PR 13517 at commit 0c692e9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val props = properties.toMap
val badKeys = props.filter { case (_, v) => v == null }.keys
if (badKeys.nonEmpty) {
throw operationNotAllowed(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message does not match with the behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I just fixed.

@SparkQA
Copy link

SparkQA commented Jun 7, 2016

Test build #60090 has finished for PR 13517 at commit eb0031a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 7, 2016

Test build #60092 has finished for PR 13517 at commit eb0031a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jun 25, 2016

@davies and @rxin I addressed the comments. Thanks!

@rxin
Copy link
Contributor

rxin commented Jun 28, 2016

cc @hvanhovell for this one

(PARTITIONED BY partitionColumnNames=identifierList)?
bucketSpec? #createTableUsing
| createTableHeader tableProvider
(OPTIONS tablePropertyList)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not generalize the tableProperty rule and use optionValue (rename it to something more consistent) as its value rule? Seems easier.

Copy link
Member Author

@HyukjinKwon HyukjinKwon Jun 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope I understood this correctly. I guess you meant the tableProperty rule, for example, as below:

tableProperty
    : key=tablePropertyKey (EQ? value=optionValue)?
    ;

If so, I am worried if this affects other rules such as DBPROPERTIES and TBLPROPERTIES (allowing other types as values). I made this separate because it seems allowing other types in OPTIONS clause complies standard SQL.

If not, could you give me an advice in a bit more detail please?

Copy link
Member Author

@HyukjinKwon HyukjinKwon Jun 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, do you mean it is okay to support other types for other rules using tableProperty as well?

Copy link
Member Author

@HyukjinKwon HyukjinKwon Jul 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry for pinging @hvanhovell)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon I am on holiday - so I am bit slow with my responses.

Yo have understood me correctly. What I am suggesting will affect the DBPROPERTIES and TBLPROPERTIES; it will also allow for boolean and numeric options. I don't think this is a bad thing, it is better to have a lenient parser and to constrain behavior in the AstBuilder (this allows us to throw much better error messages).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvanhovell I see. Thanks, I didn't expect you are on holidays..
I will push some commits and wait. Please feel free to review when you have some time!

@SparkQA
Copy link

SparkQA commented Jun 29, 2016

Test build #61429 has finished for PR 13517 at commit ea2af9b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [SPARK-14839][SQL] Support for other types as option in OPTIONS clause [SPARK-14839][SQL] Support for other types for tableProperty rule in SQL syntax Jul 3, 2016
@SparkQA
Copy link

SparkQA commented Jul 3, 2016

Test build #61672 has finished for PR 13517 at commit 4b67bab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
}

test("SPARK-14839: Support for other types as option in OPTIONS clause") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this test to DDLCommandSuite? Could you also add a test for TBLPROPERTIES/DBPROPERTIES?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

@hvanhovell
Copy link
Contributor

Looks pretty good. Left one test related comment.

@SparkQA
Copy link

SparkQA commented Jul 4, 2016

Test build #61698 has finished for PR 13517 at commit 30dfea0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 4, 2016

Test build #61699 has finished for PR 13517 at commit 1307f8c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

(@hvanhovell I just addressed your comments!)

@hvanhovell
Copy link
Contributor

@HyukjinKwon still on holiday...

LGTM - merging to master. Thanks!

@HyukjinKwon
Copy link
Member Author

Oh....... sorry... and thanks..

@hvanhovell
Copy link
Contributor

NP :)

@asfgit asfgit closed this in 34283de Jul 7, 2016
@HyukjinKwon HyukjinKwon deleted the SPARK-14839 branch January 2, 2018 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants