-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-13667][SQL] Support for specifying custom date format for date and timestamp types at CSV datasource. #11550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@falaki Would you maybe review this please..? |
|
Test build #52535 has finished for PR 11550 at commit
|
|
@falaki Just let to know, I changed the name |
|
Test build #52621 has finished for PR 11550 at commit
|
|
Test build #52619 has finished for PR 11550 at commit
|
|
@HyukjinKwon the changes look good to me. |
|
cc @rxin |
|
Test build #53049 has finished for PR 11550 at commit
|
|
Test build #53054 has finished for PR 11550 at commit
|
|
Test build #53055 has finished for PR 11550 at commit
|
|
retest this please |
|
cc @rxin |
|
Test build #53065 has finished for PR 11550 at commit
|
|
Test build #53068 has finished for PR 11550 at commit
|
|
Test build #53063 has finished for PR 11550 at commit
|
|
Test build #53298 has finished for PR 11550 at commit
|
|
retest this please |
|
Test build #53301 has finished for PR 11550 at commit
|
|
Test build #53296 has finished for PR 11550 at commit
|
|
Test build #55174 has finished for PR 11550 at commit
|
|
Test build #55496 has finished for PR 11550 at commit
|
|
@HyukjinKwon I want to get this in for 2.0. Can you not do the renaming here? It's always super confusing when classes have the same name, even when they are in different packages. We are trying to move away from that. It will also make the patch much easier to review when the patch is more "precise". |
|
@rxin Thank you. I will change the name to the original. |
| nullable: Boolean = true, | ||
| nullValue: String = ""): Any = { | ||
| nullValue: String = "", | ||
| dateFormat: SimpleDateFormat = null): Any = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will conflict with #11947 (this is making the last argument as a single option not for the multiple individual parameters like I did for infer() above).
|
Left some minor comments. LGTM otherwise. |
|
Test build #57397 has finished for PR 11550 at commit
|
|
Test build #57398 has finished for PR 11550 at commit
|
|
Test build #57401 has finished for PR 11550 at commit
|
|
Test build #57402 has finished for PR 11550 at commit
|
|
Test build #57404 has finished for PR 11550 at commit
|
|
Thanks - merging in master! |
What changes were proposed in this pull request?
This PR adds the support to specify custom date format for
DateTypeandTimestampType.For
TimestampType, this uses the given format to infer schema and also to convert the valuesFor
DateType, this uses the given format to convert the values.If the
dateFormatis not given, then it works withDateTimeUtils.stringToTime()for backwords compatibility.When it's given, then it uses
SimpleDateFormatfor parsing data.In addition,
IntegerType,DoubleTypeandLongTypehave a higher priority thanTimestampTypein type inference. This means even if the given format isyyyyoryyyy.MM, it will be inferred asIntegerTypeorDoubleType. Since it is type inference, I think it is okay to give such precedences.In addition, I renamed
csv.CSVInferSchematocsv.InferSchemaas JSON datasource hasjson.InferSchema. Although they have the same names, I did this because I thought the parent package name can still differentiate each. Accordingly, the suite name was also changed fromCSVInferSchemaSuitetoInferSchemaSuite.How was this patch tested?
unit tests are used and
./dev/run_testsfor coding style tests.