Skip to content

Conversation

@charliechen211
Copy link

Jira: https://issues.apache.org/jira/browse/SPARK-22814

PartitionColumn must be a numeric column from the table.
However, there are lots of table, which has no primary key, and has some date/timestamp indexes.

This patch solve this problem.

@gatorsmile
Copy link
Member

ok to test

@gatorsmile
Copy link
Member

Please update the PR title

@gatorsmile
Copy link
Member

Could you write a test case?

@gatorsmile
Copy link
Member

The support is interesting, but the current impl is not clean. cc @dongjoon-hyun Could you help reviewing this PR?

@SparkQA
Copy link

SparkQA commented Dec 16, 2017

Test build #84998 has finished for PR 19999 at commit d1d310c.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@charliechen211
Copy link
Author

@gatorsmile we fix it in SPARK 1.6.2, and take in use for two month. For further reason, I give one pr on master branch. I will test it next week.

@maropu
Copy link
Member

maropu commented Dec 16, 2017

We need to update the doc in DataFrameRaeder

* @param columnName the name of a column of integral type that will be used for partitioning.

IMO we might need to add a new jdbc API in DataFrameRaeder for the timestam/date partitioning.

JDBCRelation(parts, jdbcOptions)(sqlContext.sparkSession)
}

def resolvePartitionColumnType(parameters: Map[String, String]): Int = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want a column type, how about using JDBCRDD.resolveTable?

ans.toArray
}

def getCurrentValue(columnType: Int, value: Long): String = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, you can use DateTimeUtils to convert currnetValue to timestamp/date.

@maropu
Copy link
Member

maropu commented Dec 16, 2017

I noticed that, in the current master, spark throws an exception in runtime if type mismatches between a given partition column type and an actual column;

scala> jdbcTable.show
17/12/16 17:47:59 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.postgresql.util.PSQLException: ERROR: operator does not exist: text < integer
  Hint: No operator matches the given name and argument type(s). You might need to add explicit type casts.
  Position: 83
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2182)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1911)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:173)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:616)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:466)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:351)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:301)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

IMHO we'd better to check this type mismatch ASAP before execution?

@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Jul 16, 2018

Test build #93050 has finished for PR 19999 at commit d1d310c.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

@maropu Can you take this over?

@maropu
Copy link
Member

maropu commented Jul 21, 2018

ok, I will.

@asfgit asfgit closed this in 47d84e4 Jul 30, 2018
robert3005 pushed a commit to palantir/spark that referenced this pull request Jul 31, 2018
## What changes were proposed in this pull request?
This pr supported Date/Timestamp in a JDBC partition column (a numeric column is only supported in the master). This pr also modified code to verify a partition column type;
```
val jdbcTable = spark.read
 .option("partitionColumn", "text")
 .option("lowerBound", "aaa")
 .option("upperBound", "zzz")
 .option("numPartitions", 2)
 .jdbc("jdbc:postgresql:postgres", "t", options)

// with this pr
org.apache.spark.sql.AnalysisException: Partition column type should be numeric, date, or timestamp, but string found.;
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.verifyAndGetNormalizedPartitionColumn(JDBCRelation.scala:165)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:85)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:317)

// without this pr
java.lang.NumberFormatException: For input string: "aaa"
  at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
  at java.lang.Long.parseLong(Long.java:589)
  at java.lang.Long.parseLong(Long.java:631)
  at scala.collection.immutable.StringLike$class.toLong(StringLike.scala:277)
```

Closes apache#19999

## How was this patch tested?
Added tests in `JDBCSuite`.

Author: Takeshi Yamamuro <yamamuro@apache.org>

Closes apache#21834 from maropu/SPARK-22814.
@shatestest
Copy link

When i chose INSERTION_DATE as my partitionColumn with below dates
.option("lowerBound","2002-03-31");
.option("upperBound", "2019-05-01");
.option("dateFormat", "yyyy-mm-dd"); // also tried with "yyyy-MM-dd"

Getting error : ORA-01861: literal does not match format string
How to pass the dates for "lower/upperBound" ??

@HyukjinKwon
Copy link
Member

Please ask it to mailing list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants