JDBC support date/timestamp type as partitionColumn #19999

charliechen211 · 2017-12-16T06:33:51Z

Jira: https://issues.apache.org/jira/browse/SPARK-22814

PartitionColumn must be a numeric column from the table.
However, there are lots of table, which has no primary key, and has some date/timestamp indexes.

This patch solve this problem.

gatorsmile · 2017-12-16T06:41:05Z

ok to test

gatorsmile · 2017-12-16T06:41:18Z

Please update the PR title

gatorsmile · 2017-12-16T06:41:31Z

Could you write a test case?

gatorsmile · 2017-12-16T06:43:13Z

The support is interesting, but the current impl is not clean. cc @dongjoon-hyun Could you help reviewing this PR?

SparkQA · 2017-12-16T06:44:11Z

Test build #84998 has finished for PR 19999 at commit d1d310c.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

charliechen211 · 2017-12-16T06:46:22Z

@gatorsmile we fix it in SPARK 1.6.2, and take in use for two month. For further reason, I give one pr on master branch. I will test it next week.

maropu · 2017-12-16T08:18:19Z

We need to update the doc in DataFrameRaeder

spark/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala

Line 248 in 0c8fca4

    
              * @param columnName the name of a column of integral type that will be used for partitioning.

IMO we might need to add a new jdbc API in DataFrameRaeder for the timestam/date partitioning.

maropu · 2017-12-16T08:58:31Z

...re/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala

    JDBCRelation(parts, jdbcOptions)(sqlContext.sparkSession)
  }

+  def resolvePartitionColumnType(parameters: Map[String, String]): Int = {


If you want a column type, how about using JDBCRDD.resolveTable?

maropu · 2017-12-16T08:59:39Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala

    ans.toArray
  }
+
+  def getCurrentValue(columnType: Int, value: Long): String = {


Probably, you can use DateTimeUtils to convert currnetValue to timestamp/date.

maropu · 2017-12-16T09:09:10Z

I noticed that, in the current master, spark throws an exception in runtime if type mismatches between a given partition column type and an actual column;

scala> jdbcTable.show
17/12/16 17:47:59 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.postgresql.util.PSQLException: ERROR: operator does not exist: text < integer
  Hint: No operator matches the given name and argument type(s). You might need to add explicit type casts.
  Position: 83
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2182)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1911)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:173)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:616)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:466)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:351)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:301)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

IMHO we'd better to check this type mismatch ASAP before execution?

HyukjinKwon · 2018-07-16T02:22:06Z

ok to test

SparkQA · 2018-07-16T02:26:04Z

Test build #93050 has finished for PR 19999 at commit d1d310c.

This patch fails Scala style tests.
This patch does not merge cleanly.
This patch adds no public classes.

gatorsmile · 2018-07-20T17:27:08Z

@maropu Can you take this over?

maropu · 2018-07-21T00:55:34Z

ok, I will.

## What changes were proposed in this pull request? This pr supported Date/Timestamp in a JDBC partition column (a numeric column is only supported in the master). This pr also modified code to verify a partition column type; ``` val jdbcTable = spark.read .option("partitionColumn", "text") .option("lowerBound", "aaa") .option("upperBound", "zzz") .option("numPartitions", 2) .jdbc("jdbc:postgresql:postgres", "t", options) // with this pr org.apache.spark.sql.AnalysisException: Partition column type should be numeric, date, or timestamp, but string found.; at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.verifyAndGetNormalizedPartitionColumn(JDBCRelation.scala:165) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:85) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:317) // without this pr java.lang.NumberFormatException: For input string: "aaa" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:589) at java.lang.Long.parseLong(Long.java:631) at scala.collection.immutable.StringLike$class.toLong(StringLike.scala:277) ``` Closes apache#19999 ## How was this patch tested? Added tests in `JDBCSuite`. Author: Takeshi Yamamuro <yamamuro@apache.org> Closes apache#21834 from maropu/SPARK-22814.

shatestest · 2019-05-07T11:32:32Z

When i chose INSERTION_DATE as my partitionColumn with below dates
.option("lowerBound","2002-03-31");
.option("upperBound", "2019-05-01");
.option("dateFormat", "yyyy-mm-dd"); // also tried with "yyyy-MM-dd"

Getting error : ORA-01861: literal does not match format string
How to pass the dates for "lower/upperBound" ??

HyukjinKwon · 2019-05-07T12:23:40Z

Please ask it to mailing list.

JDBC support date/timestamp type as partitionColumn

d1d310c

maropu reviewed Dec 16, 2017

View reviewed changes

maropu mentioned this pull request Jul 21, 2018

[SPARK-22814][SQL] Support Date/Timestamp in a JDBC partition column #21834

Closed

asfgit closed this in 47d84e4 Jul 30, 2018

JDBC support date/timestamp type as partitionColumn #19999

JDBC support date/timestamp type as partitionColumn #19999

Uh oh!

Conversation

charliechen211 commented Dec 16, 2017

Uh oh!

gatorsmile commented Dec 16, 2017

Uh oh!

gatorsmile commented Dec 16, 2017

Uh oh!

gatorsmile commented Dec 16, 2017

Uh oh!

gatorsmile commented Dec 16, 2017

Uh oh!

SparkQA commented Dec 16, 2017

Uh oh!

charliechen211 commented Dec 16, 2017

Uh oh!

maropu commented Dec 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maropu Dec 16, 2017

Choose a reason for hiding this comment

Uh oh!

maropu Dec 16, 2017

Choose a reason for hiding this comment

Uh oh!

maropu commented Dec 16, 2017

Uh oh!

HyukjinKwon commented Jul 16, 2018

Uh oh!

SparkQA commented Jul 16, 2018

Uh oh!

gatorsmile commented Jul 20, 2018

Uh oh!

maropu commented Jul 21, 2018

Uh oh!

shatestest commented May 7, 2019

Uh oh!

HyukjinKwon commented May 7, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

maropu commented Dec 16, 2017 •

edited

Loading