Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: CAST timestamp to string ignores timezone prior to Spark 3.4 #468

Closed
andygrove opened this issue May 24, 2024 · 2 comments · Fixed by #923
Closed

bug: CAST timestamp to string ignores timezone prior to Spark 3.4 #468

andygrove opened this issue May 24, 2024 · 2 comments · Fixed by #923
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed

Comments

@andygrove
Copy link
Member

Describe the bug

In CometExpressionSuite we have two tests that are ignored for Spark 3.2 and 3.3.

  test("cast timestamp and timestamp_ntz to string") {
    // TODO: make the test pass for Spark 3.2 & 3.3
    assume(isSpark34Plus)
  test("cast timestamp and timestamp_ntz to long, date") {
    // TODO: make the test pass for Spark 3.2 & 3.3
    assume(isSpark34Plus)

Enabling these tests for 3.2 shows incorrect output:

== Results ==
  !== Correct Answer - 2001 ==                                                                         == Spark Answer - 2001 ==
   struct<tz_millis:string,ntz_millis:string,tz_micros:string,ntz_micros:string>                       struct<tz_millis:string,ntz_millis:string,tz_micros:string,ntz_micros:string>
  ![1970-01-01 05:29:59.991,1970-01-01 05:29:59.991,1970-01-01 05:29:59.991,1970-01-01 05:29:59.991]   [1970-01-01 05:29:59.991,1969-12-31 23:59:59.991,1970-01-01 05:29:59.991,1969-12-31 23:59:59.991]
  == Results ==
  !== Correct Answer - 10000 ==                                                                                                              == Spark Answer - 10000 ==
   struct<tz_millis:bigint,tz_micros:bigint,tz_millis_to_date:date,ntz_millis_to_date:date,tz_micros_to_date:date,ntz_micros_to_date:date>   struct<tz_millis:bigint,tz_micros:bigint,tz_millis_to_date:date,ntz_millis_to_date:date,tz_micros_to_date:date,ntz_micros_to_date:date>
  ![-1,-1,1970-01-01,1970-01-01,1970-01-01,1970-01-01]                                                                                       [-1,-1,1970-01-01,1969-12-31,1970-01-01,1969-12-31]

We should fall back to Spark rather than produce the wrong results.

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

@andygrove andygrove added bug Something isn't working good first issue Good for newcomers labels May 24, 2024
@parthchandra
Copy link
Contributor

IIRC there were differences in output between Spark 3.2 and Spark 3.4 for the timestamp_ntz type.
Taking a closer look, the definition of timestamp_ntz (in Spark) essentially means that the value should be left untouched.
So a value - 0 means 1970-01-01 00:00:00 in the session timezone. In the example above, the value is -1 so the correct output for timezone_ntz (millis) should be 1960-12-31 23:59:59 (ignoring the millis). Spark 3.2's answer of 1970-01-01 05:29:59 seems incorrect to me.

@andygrove andygrove added the help wanted Extra attention is needed label Jun 6, 2024
@suibianwanwank
Copy link

suibianwanwank commented Jun 27, 2024

I've recently been learning about the project and can be assigned me if this issue hasn't already been resolved,thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants