Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][core] Implement more type conversions for IcebergConversions #3788

Open
1 of 2 tasks
tsreaper opened this issue Jul 20, 2024 · 14 comments · May be fixed by #3866
Open
1 of 2 tasks

[Feature][core] Implement more type conversions for IcebergConversions #3788

tsreaper opened this issue Jul 20, 2024 · 14 comments · May be fixed by #3866
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@tsreaper
Copy link
Contributor

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

In #3731 we've introduced IcebergCommitCallback to create Iceberg metadata after commit. We convert Paimon objects to Iceberg byte buffers in IcebergConversions, however it currently only supports a few types and does not support types like timestamp and time.

Solution

Support timestamp and time conversions in IcebergConversions.

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@tsreaper tsreaper added enhancement New feature or request good first issue Good for newcomers labels Jul 20, 2024
@dbac
Copy link

dbac commented Jul 28, 2024

I'm willing to submit a PR!

@dbac
Copy link

dbac commented Jul 28, 2024

Do this according to the following specifications?
time | Stores microseconds from midnight in an 8-byte little-endian long
timestamp | Stores microseconds from 1970-01-01 00:00:00.000000 in an 8-byte little-endian long

then,like this:

case TIME:
TIMESTAMP:
return ByteBuffer.allocate(8)
.order(ByteOrder.LITTLE_ENDIAN)
.putLong(0, (long) value);

@tsreaper
Copy link
Contributor Author

I'm willing to submit a PR!

Thanks for your interest. Please note that there are two types of timestamp in Paimon: TimestampType and LocalZonedTimestampType. TimestampType is timestamptz in Iceberg, while LocalZonedTimestampType is timestamp in Iceberg. You also need to be careful with the precesion of timestamp, because Iceberg has timestamp and timestamp_ns.

All in all you need to write tests carefully, especially for timestamp.

@dbac
Copy link

dbac commented Jul 29, 2024

I want to be simple
Whether there are similar code references?
@tsreaper

@dbac
Copy link

dbac commented Jul 29, 2024

image
I don't know that Object value is string ? or Timestamp ?
mabye Timestamp ? then change to buffer?

@tsreaper
Copy link
Contributor Author

The Paimon object for TimestampType and LocalZonedTimestampType is org.apache.paimon.data.Timestamp. It stores millisecond and nanoOfMillisecond in the object.

For TimestampType, this millisecond is the time starting from 1970-01-01 00:00:00 UTC, so it is timestamptz in Iceberg. For LocalZonedTimestampType, this millisecond is the time starting from 1970-01-01 00:00:00 local timezone, so it is timestamp in Iceberg.

Also you need to check if precision <= 6, if yes it is timestamp or timestamptz; Otherwise if precision > 6 its timestamp_ns or timestamptz_ns.

Don't forget to add complete tests for each timestamp type and different precisions.

@dbac
Copy link

dbac commented Jul 30, 2024

image
It seems a bit complicated
icerberg does not have local-timestamp-micros ,but paimon has local-timestamp-micros,
Maybe I need to learn more about paimon
image

@dbac
Copy link

dbac commented Jul 30, 2024

Just give me a little time. I have to fix it

@tsreaper
Copy link
Contributor Author

"local-timestamp-micros" is not a data type in Paimon. It is a data type in avro. Iceberg's avro reader has to deal with it. However for this issue, you only need to convert between Paimon types and Iceberg types, so you don't need to consider about avro types.

@dbac
Copy link

dbac commented Jul 31, 2024

image
I have completed it, but nanoseconds cannot be tested, because the avro format in the test case cannot support precisions greater than 6
I don't know if my code idea is correct. Please review the code

@dbac
Copy link

dbac commented Aug 1, 2024

@tsreaper Can I submit PR?

@tsreaper
Copy link
Contributor Author

tsreaper commented Aug 1, 2024

@tsreaper Can I submit PR?

Sure, feel free to submit.

@dbac dbac linked a pull request Aug 1, 2024 that will close this issue
@dbac
Copy link

dbac commented Aug 2, 2024

@tsreaper Please help to review my pr, thank you very much

@dbac
Copy link

dbac commented Aug 5, 2024

@tsreaper
next step?
With that done, I'm ready to continue contributing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants