-
Notifications
You must be signed in to change notification settings - Fork 29k
JDBC support date/timestamp type as partitionColumn #19999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
|
Please update the PR title |
|
Could you write a test case? |
|
The support is interesting, but the current impl is not clean. cc @dongjoon-hyun Could you help reviewing this PR? |
|
Test build #84998 has finished for PR 19999 at commit
|
|
@gatorsmile we fix it in SPARK 1.6.2, and take in use for two month. For further reason, I give one pr on master branch. I will test it next week. |
|
We need to update the doc in
IMO we might need to add a new jdbc API in |
| JDBCRelation(parts, jdbcOptions)(sqlContext.sparkSession) | ||
| } | ||
|
|
||
| def resolvePartitionColumnType(parameters: Map[String, String]): Int = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want a column type, how about using JDBCRDD.resolveTable?
| ans.toArray | ||
| } | ||
|
|
||
| def getCurrentValue(columnType: Int, value: Long): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, you can use DateTimeUtils to convert currnetValue to timestamp/date.
|
I noticed that, in the current master, spark throws an exception in runtime if type mismatches between a given partition column type and an actual column; IMHO we'd better to check this type mismatch ASAP before execution? |
|
ok to test |
|
Test build #93050 has finished for PR 19999 at commit
|
|
@maropu Can you take this over? |
|
ok, I will. |
## What changes were proposed in this pull request?
This pr supported Date/Timestamp in a JDBC partition column (a numeric column is only supported in the master). This pr also modified code to verify a partition column type;
```
val jdbcTable = spark.read
.option("partitionColumn", "text")
.option("lowerBound", "aaa")
.option("upperBound", "zzz")
.option("numPartitions", 2)
.jdbc("jdbc:postgresql:postgres", "t", options)
// with this pr
org.apache.spark.sql.AnalysisException: Partition column type should be numeric, date, or timestamp, but string found.;
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.verifyAndGetNormalizedPartitionColumn(JDBCRelation.scala:165)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:85)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:317)
// without this pr
java.lang.NumberFormatException: For input string: "aaa"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at scala.collection.immutable.StringLike$class.toLong(StringLike.scala:277)
```
Closes apache#19999
## How was this patch tested?
Added tests in `JDBCSuite`.
Author: Takeshi Yamamuro <yamamuro@apache.org>
Closes apache#21834 from maropu/SPARK-22814.
|
When i chose INSERTION_DATE as my partitionColumn with below dates Getting error : ORA-01861: literal does not match format string |
|
Please ask it to mailing list. |
Jira: https://issues.apache.org/jira/browse/SPARK-22814
PartitionColumn must be a numeric column from the table.
However, there are lots of table, which has no primary key, and has some date/timestamp indexes.
This patch solve this problem.