Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datetime and decimal should be logical types #19817

Open
damccorm opened this issue Jun 4, 2022 · 5 comments
Open

datetime and decimal should be logical types #19817

damccorm opened this issue Jun 4, 2022 · 5 comments

Comments

@damccorm
Copy link
Contributor

damccorm commented Jun 4, 2022

Currently datetime and decimal are implemented as primitive types in the Java SDK. They should be logical types as documented in https://s.apache.org/beam-schemas

Imported from Jira BEAM-7554. Original Jira may contain additional context.
Reported by: bhulette.

@Abacn
Copy link
Contributor

Abacn commented Jul 26, 2022

.take-issue

@Abacn
Copy link
Contributor

Abacn commented Jul 27, 2022

This issue was raised again in May 22, per this comment in Jira

ilango-mediaagility:
Due to this issue, cross language transforms are affected. For example, when using ReadFromJdbc transform in python, we get the decimal and datetime as logical types, but they are encoded differently than the underlying representation types, resulting in incorrect values.

This is blocked by #21433

@Abacn
Copy link
Contributor

Abacn commented Aug 3, 2022

There is a problem of replace all DATETIME fieldtype to logical type, per @TheNeuralBit:

One thing that I realized as I wrote up my PR: If we make DATETIME a logical type backed by INT64, then it will have to use VarLongCoder, rather than InstantCoder as it currently does.

It seems like that would be a problem since InstantCoder is designed to make lexicographic order correspond to chronological order.
Note that InstantCoder is not only used in processing schemas but also in coding windows:

Instant timestamp = InstantCoder.of().decode(inStream);

Tried to replace the InstantCoder implementation using either a VarInt or a ByteArray it breaks window decoding here. Seems like there is some predefined stream not using InstantCoder.encode to pack a timestamp.

Thus I decided not to change InstantCoder at this moment, instead extend the support MicrosInstant logical type in Java sdk and DateTime logical type in python sdk, seems to be the simplest way to make xlang read/write records containing timestamp type work.

@TheNeuralBit
Copy link
Member

Yeah I agree that it's better to just prefer logical types for multi-language transforms, rather than trying to shunt the DATETIME primitive type this way. We may want to add types for additional precisions as needed (MillInstant, NanosInstant, etc)

@Abacn
Copy link
Contributor

Abacn commented Feb 23, 2023

update: Python support is complete. Leave this open for Go support

@Abacn Abacn removed their assignment Feb 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants