Skip to content

Conversation

@shardulm94
Copy link
Contributor

@shardulm94 shardulm94 commented Nov 20, 2020

Spark only supports reading timestamp with time zone. However we have a lot of Hive tables which store timestamp without time zone.

In this PR, we modify the Spark code to allow reading timestamp without time zone as timestamp with time zone. Generally, this is not safe as timestamp without time zone is supposed to represent wall clock time semantics, i.e. no matter the reader/writer timezone 3PM should always be read as 3PM, but timestamp with time zone represents instant semantics, i.e the timestamp is adjusted so that the corresponding time in the reader timezone is displayed. However, at LinkedIn, all readers and writers are in the UTC timezone as our production machines are set to UTC. So, timestamp with/without time zone is the same.

We put this feature behind a flag to not do this conversion by default and we will enable this flag at LinkedIn

cc: @HotSushi @wmoustafa

@github-actions github-actions bot added the SPARK label Nov 20, 2020
@wmoustafa
Copy link

I think what we should do is:
1- Check the list of functions in Hive that work with timestamps (e.g., second, hour, datediff, unix_timestamp), and verify if they take "Timestamp" or "String" as input. Based on the Hive manual, they take strings. This exercise should consider Hive 1.1 preferably.
2- Check the same list of functions in Spark, and see if Spark would deviate out of the box (due to Spark in some cases dealing with "Timestamp with Timezone). If Hive and Spark both expect Strings, is the string format a factor (if it contains timezone or if it does not)? Does it mean the query is not portable between Hive and Spark? (all this is research for Hive/Spark compatibility).
3- See if Dali introduces UTC, any of the Hive behavior in (1) or Spark behavior in (2) would break.

Copy link

@wmoustafa wmoustafa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants