-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31005][SQL] Support time zone ids in casting strings to timestamps #27753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #119143 has finished for PR 27753 at commit
|
| } | ||
| } else if (i == 5 || i == 6) { | ||
| if (b == 'Z') { | ||
| if (b == '-' || b == '+') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getZoneId() is able to handle zone offsets w/ prefix - and + but it doesn't support the format 7:3 like in
spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
Line 164 in ffc0935
| checkCastStringToTimestamp("2015-03-18T12:03:17+7:3", new Timestamp(c.getTimeInMillis)) |
|
@cloud-fan @HyukjinKwon Please, review the PR. I haven't updated comments for |
|
Test build #119228 has finished for PR 27753 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
Show resolved
Hide resolved
| checkStringToTimestamp("2015-03-18T12:03:17.123121+7:30", expected) | ||
|
|
||
| zoneId = getZoneId("GMT+07:30") | ||
| expected = Option(date(2015, 3, 18, 12, 3, 17, 123120, zid = zoneId)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why drop this test? 123120 is different from 123121
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted it back, and added more tests
| i += 1 | ||
| tz = Some(43) | ||
| } else if (b == '-' || b == '+') { | ||
| tz = Some(new String(bytes, j, 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just b.toChar.toString?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for consistency with another change
|
Test build #119309 has finished for PR 27753 at commit
|
|
thanks, merging to master/3.0! |
…amps ### What changes were proposed in this pull request? In the PR, I propose to change `DateTimeUtils.stringToTimestamp` to support any valid time zone id at the end of input string. After the changes, the function accepts zone ids in the formats: - no zone id. In that case, the function uses the local session time zone from the SQL config `spark.sql.session.timeZone` - -[h]h:[m]m - +[h]h:[m]m - Z - Short zone id, see https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#SHORT_IDS - Zone ID starts with 'UTC+', 'UTC-', 'GMT+', 'GMT-', 'UT+' or 'UT-'. The ID is split in two, with a two or three letter prefix and a suffix starting with the sign. The suffix must be in the formats: - +|-h[h] - +|-hh[:]mm - +|-hh:mm:ss - +|-hhmmss - Region-based zone IDs in the form `{area}/{city}`, such as `Europe/Paris` or `America/New_York`. The default set of region ids is supplied by the IANA Time Zone Database (TZDB). ### Why are the changes needed? - To use `stringToTimestamp` as a substitution of removed `stringToTime`, see #27710 (comment) - Improve UX of Spark SQL by allowing flexible formats of zone ids. Currently, Spark accepts only `Z` and zone offsets that can be inconvenient when a time zone offset is shifted due to daylight saving rules. For instance: ```sql spark-sql> select cast('2015-03-18T12:03:17.123456 Europe/Moscow' as timestamp); NULL ``` ### Does this PR introduce any user-facing change? Yes. After the changes, casting strings to timestamps allows time zone id at the end of the strings: ```sql spark-sql> select cast('2015-03-18T12:03:17.123456 Europe/Moscow' as timestamp); 2015-03-18 12:03:17.123456 ``` ### How was this patch tested? - Added new test cases to the `string to timestamp` test in `DateTimeUtilsSuite`. - Run `CastSuite` and `AnsiCastSuite`. Closes #27753 from MaxGekk/stringToTimestamp-uni-zoneId. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 1fd9a91) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…amps ### What changes were proposed in this pull request? In the PR, I propose to change `DateTimeUtils.stringToTimestamp` to support any valid time zone id at the end of input string. After the changes, the function accepts zone ids in the formats: - no zone id. In that case, the function uses the local session time zone from the SQL config `spark.sql.session.timeZone` - -[h]h:[m]m - +[h]h:[m]m - Z - Short zone id, see https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#SHORT_IDS - Zone ID starts with 'UTC+', 'UTC-', 'GMT+', 'GMT-', 'UT+' or 'UT-'. The ID is split in two, with a two or three letter prefix and a suffix starting with the sign. The suffix must be in the formats: - +|-h[h] - +|-hh[:]mm - +|-hh:mm:ss - +|-hhmmss - Region-based zone IDs in the form `{area}/{city}`, such as `Europe/Paris` or `America/New_York`. The default set of region ids is supplied by the IANA Time Zone Database (TZDB). ### Why are the changes needed? - To use `stringToTimestamp` as a substitution of removed `stringToTime`, see apache#27710 (comment) - Improve UX of Spark SQL by allowing flexible formats of zone ids. Currently, Spark accepts only `Z` and zone offsets that can be inconvenient when a time zone offset is shifted due to daylight saving rules. For instance: ```sql spark-sql> select cast('2015-03-18T12:03:17.123456 Europe/Moscow' as timestamp); NULL ``` ### Does this PR introduce any user-facing change? Yes. After the changes, casting strings to timestamps allows time zone id at the end of the strings: ```sql spark-sql> select cast('2015-03-18T12:03:17.123456 Europe/Moscow' as timestamp); 2015-03-18 12:03:17.123456 ``` ### How was this patch tested? - Added new test cases to the `string to timestamp` test in `DateTimeUtilsSuite`. - Run `CastSuite` and `AnsiCastSuite`. Closes apache#27753 from MaxGekk/stringToTimestamp-uni-zoneId. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
In the PR, I propose to change
DateTimeUtils.stringToTimestampto support any valid time zone id at the end of input string. After the changes, the function accepts zone ids in the formats:spark.sql.session.timeZone{area}/{city}, such asEurope/ParisorAmerica/New_York. The default set of region ids is supplied by the IANA Time Zone Database (TZDB).Why are the changes needed?
stringToTimestampas a substitution of removedstringToTime, see [SPARK-30960][SQL] add back the legacy date/timestamp format support in CSV/JSON parser #27710 (comment)Zand zone offsets that can be inconvenient when a time zone offset is shifted due to daylight saving rules. For instance:Does this PR introduce any user-facing change?
Yes. After the changes, casting strings to timestamps allows time zone id at the end of the strings:
How was this patch tested?
string to timestamptest inDateTimeUtilsSuite.CastSuiteandAnsiCastSuite.