Skip to content

Conversation

@HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

This PR adds the support to specify custom date format for DateType and TimestampType.

For TimestampType, this uses the given format to infer schema and also to convert the values
For DateType, this uses the given format to convert the values.
If the dateFormat is not given, then it works with DateTimeUtils.stringToTime() for backwords compatibility.
When it's given, then it uses SimpleDateFormat for parsing data.

In addition, IntegerType, DoubleType and LongType have a higher priority than TimestampType in type inference. This means even if the given format is yyyy or yyyy.MM, it will be inferred as IntegerType or DoubleType. Since it is type inference, I think it is okay to give such precedences.

In addition, I renamed csv.CSVInferSchema to csv.InferSchema as JSON datasource has json.InferSchema. Although they have the same names, I did this because I thought the parent package name can still differentiate each. Accordingly, the suite name was also changed from CSVInferSchemaSuite to InferSchemaSuite.

How was this patch tested?

unit tests are used and ./dev/run_tests for coding style tests.

@HyukjinKwon
Copy link
Member Author

@rxin There should be a conflict with #11315 which I think it's supposed to be merged (assuming from your comment).

I will resolve the conflict as soon as either this one or that one is merged.

@HyukjinKwon
Copy link
Member Author

@falaki Would you maybe review this please..?

@SparkQA
Copy link

SparkQA commented Mar 7, 2016

Test build #52535 has finished for PR 11550 at commit 5c990cd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

@falaki Just let to know, I changed the name CSVInferSchema to InferSchema mainly for consistent names for CSV and JSON data source but maybe they might have to be CSVInferSchema and JSONInferSchema.

@SparkQA
Copy link

SparkQA commented Mar 8, 2016

Test build #52621 has finished for PR 11550 at commit c51d4ef.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 8, 2016

Test build #52619 has finished for PR 11550 at commit db27259.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class DefaultSource extends FileFormat with DataSourceRegister
    • class InMemoryCatalog extends ExternalCatalog
    • abstract class ExternalCatalog
    • case class CatalogTablePartition(
    • case class WriteRelation(
    • class DefaultSource extends FileFormat with DataSourceRegister
    • class DefaultSource extends FileFormat with DataSourceRegister
    • case class FileTypes(
    • class DefaultSource extends FileFormat with DataSourceRegister
    • case class HadoopFsRelation(
    • trait FileFormat
    • trait FileCatalog
    • class HDFSFileCatalog(
    • class HiveFileCatalog(
    • .doc(\"A comma-separated list of class names of services to add to the scheduler.\")

@falaki
Copy link
Contributor

falaki commented Mar 8, 2016

@HyukjinKwon the changes look good to me.

@HyukjinKwon
Copy link
Member Author

cc @rxin

@SparkQA
Copy link

SparkQA commented Mar 14, 2016

Test build #53049 has finished for PR 11550 at commit e0612f1.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 14, 2016

Test build #53054 has finished for PR 11550 at commit 9d79996.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 14, 2016

Test build #53055 has finished for PR 11550 at commit 49a8210.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@HyukjinKwon
Copy link
Member Author

cc @rxin

@SparkQA
Copy link

SparkQA commented Mar 14, 2016

Test build #53065 has finished for PR 11550 at commit 7f701a9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 14, 2016

Test build #53068 has finished for PR 11550 at commit fd03024.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 14, 2016

Test build #53063 has finished for PR 11550 at commit 49a8210.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 16, 2016

Test build #53298 has finished for PR 11550 at commit 6bc0ffb.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 16, 2016

Test build #53301 has finished for PR 11550 at commit 6bc0ffb.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 16, 2016

Test build #53296 has finished for PR 11550 at commit be4996b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 7, 2016

Test build #55174 has finished for PR 11550 at commit 3f3ef6f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 11, 2016

Test build #55496 has finished for PR 11550 at commit c197d52.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Apr 30, 2016

@HyukjinKwon I want to get this in for 2.0. Can you not do the renaming here? It's always super confusing when classes have the same name, even when they are in different packages. We are trying to move away from that. It will also make the patch much easier to review when the patch is more "precise".

@HyukjinKwon
Copy link
Member Author

@rxin Thank you. I will change the name to the original.

nullable: Boolean = true,
nullValue: String = ""): Any = {
nullValue: String = "",
dateFormat: SimpleDateFormat = null): Any = {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will conflict with #11947 (this is making the last argument as a single option not for the multiple individual parameters like I did for infer() above).

@rxin
Copy link
Contributor

rxin commented Apr 30, 2016

Left some minor comments. LGTM otherwise.

@SparkQA
Copy link

SparkQA commented Apr 30, 2016

Test build #57397 has finished for PR 11550 at commit c599d92.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 30, 2016

Test build #57398 has finished for PR 11550 at commit 58159f7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 30, 2016

Test build #57401 has finished for PR 11550 at commit 49732b6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 30, 2016

Test build #57402 has finished for PR 11550 at commit 3572199.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 30, 2016

Test build #57404 has finished for PR 11550 at commit c1710d5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Apr 30, 2016

Thanks - merging in master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants