Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet: Support non-UTC timestamps #976

Closed
rcaudy opened this issue Aug 5, 2021 · 4 comments
Closed

Parquet: Support non-UTC timestamps #976

rcaudy opened this issue Aug 5, 2021 · 4 comments
Assignees
Labels
feature request New feature or request parquet Related to the Parquet integration
Milestone

Comments

@rcaudy
Copy link
Member

rcaudy commented Aug 5, 2021

Running (in groovy):

t = io.deephaven.db.tables.utils.ParquetTools.readTable("/data/parquetFiles/eth_v2_p2_cBROTLI.parquet")

Would result in:

io.deephaven.UncheckedDeephavenException: Unable to read column [tpep_pickup_datetime]: TimestampLogicalType, isAdjustedToUTC=false, unit=MICROS not supported

If we didn't have a fortunate schema conversion error that effectively treats the timestamps as UTC. We would like to reverse the error and properly support zoned timestamps. Presumably we could specify a timezone in read instructions.

We do have limited support in: io.deephaven.db.v2.locations.parquet.topage.ToDBDateTimePageFromInt96
Likely the reference time zone used there should be moved out to ParquetInstructions. Then we just need to integrate those instructions into the Int96 support and into the more standard datetime fields.

@rcaudy rcaudy added feature request New feature or request triage labels Aug 5, 2021
@rcaudy rcaudy self-assigned this Aug 5, 2021
@rcaudy rcaudy added parquet Related to the Parquet integration and removed triage labels Aug 5, 2021
@rcaudy rcaudy added this to the Backlog milestone Aug 5, 2021
@rcaudy
Copy link
Member Author

rcaudy commented Aug 9, 2021

Elaborating on the description, the code in io.deephaven.parquet.ParquetFileReader#buildChildren has the fortunate bug, vs. code in io.deephaven.parquet.tempfix.ParquetMetadataConverter#buildChildren which handles converted types correctly as far as I can tell.

@malhotrashivam
Copy link
Contributor

malhotrashivam commented Sep 19, 2023

Parquet handling of isAdjustedToUTC:

TIMESTAMP type has two extra fields: isAdjustedToUTC (= true/false), unit (= MILLIS/MICROS/NANOS).

  • TIMESTAMP with isAdjustedToUTC=true is defined as the time elapsed since the Unix epoch. So we can translate such a instant to UTC time and then adjust it to local timezone. For example, TIMESTAMP(isAdjustedToUTC=true, unit=MILLIS) value of 172800000 corresponds to 1970-01-03 00:00:00 UTC and the corresponding local time will change according to timezone.
  • TIMESTAMP with isAdjustedToUTC=false represents time fields in a local timezone, regardless of what local timezone is. For example, TIMESTAMP(isAdjustedToUTC=false, unit=MILLIS) value of 172800000 corresponds to 1970-01-03 00:00:00 in any timezone.

More details in official docs here.

@devinrsmith
Copy link
Member

I think on a column-by-column basis we should be able to add in an assumed timezone; for example, the NYC taxi dataset e, we can assume that the timezone is NYC timezone. As it stands today, we read this in as a LocalDateTime.

@malhotrashivam
Copy link
Contributor

Original issue completed as part of #4801.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request parquet Related to the Parquet integration
Projects
None yet
Development

No branches or pull requests

3 participants