Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Mar 1, 2020

What changes were proposed in this pull request?

In the PR, I propose to change DateTimeUtils.stringToTimestamp to support any valid time zone id at the end of input string. After the changes, the function accepts zone ids in the formats:

  • no zone id. In that case, the function uses the local session time zone from the SQL config spark.sql.session.timeZone
  • -[h]h:[m]m
  • +[h]h:[m]m
  • Z
  • Short zone id, see https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#SHORT_IDS
  • Zone ID starts with 'UTC+', 'UTC-', 'GMT+', 'GMT-', 'UT+' or 'UT-'. The ID is split in two, with a two or three letter prefix and a suffix starting with the sign. The suffix must be in the formats:
    • +|-h[h]
    • +|-hh[:]mm
    • +|-hh:mm:ss
    • +|-hhmmss
  • Region-based zone IDs in the form {area}/{city}, such as Europe/Paris or America/New_York. The default set of region ids is supplied by the IANA Time Zone Database (TZDB).

Why are the changes needed?

spark-sql> select cast('2015-03-18T12:03:17.123456 Europe/Moscow' as timestamp);
NULL

Does this PR introduce any user-facing change?

Yes. After the changes, casting strings to timestamps allows time zone id at the end of the strings:

spark-sql> select cast('2015-03-18T12:03:17.123456 Europe/Moscow' as timestamp);
2015-03-18 12:03:17.123456

How was this patch tested?

  • Added new test cases to the string to timestamp test in DateTimeUtilsSuite.
  • Run CastSuite and AnsiCastSuite.

@SparkQA
Copy link

SparkQA commented Mar 1, 2020

Test build #119143 has finished for PR 27753 at commit f107331.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk MaxGekk changed the title [WIP][SQL] Support time zone ids in casting strings to timestamps [SPARK-31005][SQL] Support time zone ids in casting strings to timestamps Mar 2, 2020
}
} else if (i == 5 || i == 6) {
if (b == 'Z') {
if (b == '-' || b == '+') {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getZoneId() is able to handle zone offsets w/ prefix - and + but it doesn't support the format 7:3 like in

checkCastStringToTimestamp("2015-03-18T12:03:17+7:3", new Timestamp(c.getTimeInMillis))
. So, I have to keep the code for backward compatibility.

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 2, 2020

@cloud-fan @HyukjinKwon Please, review the PR. I haven't updated comments for stringToTimestamp yet. I will do that when I will address your comment later.

@SparkQA
Copy link

SparkQA commented Mar 3, 2020

Test build #119228 has finished for PR 27753 at commit bdfe5f3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

checkStringToTimestamp("2015-03-18T12:03:17.123121+7:30", expected)

zoneId = getZoneId("GMT+07:30")
expected = Option(date(2015, 3, 18, 12, 3, 17, 123120, zid = zoneId))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why drop this test? 123120 is different from 123121

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted it back, and added more tests

i += 1
tz = Some(43)
} else if (b == '-' || b == '+') {
tz = Some(new String(bytes, j, 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just b.toChar.toString?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for consistency with another change

@SparkQA
Copy link

SparkQA commented Mar 4, 2020

Test build #119309 has finished for PR 27753 at commit 00a07ab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in 1fd9a91 Mar 5, 2020
cloud-fan pushed a commit that referenced this pull request Mar 5, 2020
…amps

### What changes were proposed in this pull request?
In the PR, I propose to change `DateTimeUtils.stringToTimestamp` to support any valid time zone id at the end of input string. After the changes, the function accepts zone ids in the formats:
- no zone id. In that case, the function uses the local session time zone from the SQL config `spark.sql.session.timeZone`
- -[h]h:[m]m
- +[h]h:[m]m
- Z
- Short zone id, see https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#SHORT_IDS
- Zone ID starts with 'UTC+', 'UTC-', 'GMT+', 'GMT-', 'UT+' or 'UT-'. The ID is split in two, with a two or three letter prefix and a suffix starting with the sign. The suffix must be in the formats:
  - +|-h[h]
  - +|-hh[:]mm
  - +|-hh:mm:ss
  - +|-hhmmss
- Region-based zone IDs in the form `{area}/{city}`, such as `Europe/Paris` or `America/New_York`. The default set of region ids is supplied by the IANA Time Zone Database (TZDB).

### Why are the changes needed?
- To use `stringToTimestamp` as a substitution of removed `stringToTime`, see #27710 (comment)
- Improve UX of Spark SQL by allowing flexible formats of zone ids. Currently, Spark accepts only `Z` and zone offsets that can be inconvenient when a time zone offset is shifted due to daylight saving rules. For instance:
```sql
spark-sql> select cast('2015-03-18T12:03:17.123456 Europe/Moscow' as timestamp);
NULL
```

### Does this PR introduce any user-facing change?
Yes. After the changes, casting strings to timestamps allows time zone id at the end of the strings:
```sql
spark-sql> select cast('2015-03-18T12:03:17.123456 Europe/Moscow' as timestamp);
2015-03-18 12:03:17.123456
```

### How was this patch tested?
- Added new test cases to the `string to timestamp` test in `DateTimeUtilsSuite`.
- Run `CastSuite` and `AnsiCastSuite`.

Closes #27753 from MaxGekk/stringToTimestamp-uni-zoneId.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 1fd9a91)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…amps

### What changes were proposed in this pull request?
In the PR, I propose to change `DateTimeUtils.stringToTimestamp` to support any valid time zone id at the end of input string. After the changes, the function accepts zone ids in the formats:
- no zone id. In that case, the function uses the local session time zone from the SQL config `spark.sql.session.timeZone`
- -[h]h:[m]m
- +[h]h:[m]m
- Z
- Short zone id, see https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#SHORT_IDS
- Zone ID starts with 'UTC+', 'UTC-', 'GMT+', 'GMT-', 'UT+' or 'UT-'. The ID is split in two, with a two or three letter prefix and a suffix starting with the sign. The suffix must be in the formats:
  - +|-h[h]
  - +|-hh[:]mm
  - +|-hh:mm:ss
  - +|-hhmmss
- Region-based zone IDs in the form `{area}/{city}`, such as `Europe/Paris` or `America/New_York`. The default set of region ids is supplied by the IANA Time Zone Database (TZDB).

### Why are the changes needed?
- To use `stringToTimestamp` as a substitution of removed `stringToTime`, see apache#27710 (comment)
- Improve UX of Spark SQL by allowing flexible formats of zone ids. Currently, Spark accepts only `Z` and zone offsets that can be inconvenient when a time zone offset is shifted due to daylight saving rules. For instance:
```sql
spark-sql> select cast('2015-03-18T12:03:17.123456 Europe/Moscow' as timestamp);
NULL
```

### Does this PR introduce any user-facing change?
Yes. After the changes, casting strings to timestamps allows time zone id at the end of the strings:
```sql
spark-sql> select cast('2015-03-18T12:03:17.123456 Europe/Moscow' as timestamp);
2015-03-18 12:03:17.123456
```

### How was this patch tested?
- Added new test cases to the `string to timestamp` test in `DateTimeUtilsSuite`.
- Run `CastSuite` and `AnsiCastSuite`.

Closes apache#27753 from MaxGekk/stringToTimestamp-uni-zoneId.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@MaxGekk MaxGekk deleted the stringToTimestamp-uni-zoneId branch June 5, 2020 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants