Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug report] trino couldn't read Iceberg table with timestamp column created by spark #4743

Closed
FANNG1 opened this issue Aug 28, 2024 · 7 comments · Fixed by #4893
Closed
Assignees
Labels
bug Something isn't working

Comments

@FANNG1
Copy link
Contributor

FANNG1 commented Aug 28, 2024

Version

main branch

Describe what's wrong

trino couldn't read Iceberg partition table created by spark

Error message and/or stacktrace

Query 20240828_134434_01832_9eicb failed: Could not serialize column 'hire_date' of type 'timestamp(3)' at position 1:4

How to reproduce

Spark sql:

CREATE DATABASE IF NOT EXISTS mydatabase;
USE mydatabase;

CREATE TABLE IF NOT EXISTS employee (
  id bigint,
  name string,
  department string,
  hire_date timestamp
) USING iceberg
PARTITIONED BY (days(hire_date));
DESC TABLE EXTENDED employee;

INSERT INTO employee
VALUES
(1, 'Alice', 'Engineering', TIMESTAMP '2021-01-01 09:00:00'),
(2, 'Bob', 'Marketing', TIMESTAMP '2021-02-01 10:30:00'),
(3, 'Charlie', 'Sales', TIMESTAMP '2021-03-01 08:45:00');


trino:

select * from iceberg_hive.gt_db1.employee;

Additional context

No response

@FANNG1 FANNG1 added the bug Something isn't working label Aug 28, 2024
@jerryshao
Copy link
Contributor

Shall we fix this in 0.6.0?

@FANNG1 FANNG1 changed the title [Bug report] trino couldn't read Iceberg partition table created by spark [Bug report] trino couldn't read Iceberg table with timestamp column created by spark Aug 29, 2024
@FANNG1
Copy link
Contributor Author

FANNG1 commented Aug 29, 2024

The problem still exists if using origin spark Iceberg connector, cc @jerryshao @diqiu50

@jerryshao
Copy link
Contributor

I see. We can defer this issue to the next release.

@diqiu50
Copy link
Contributor

diqiu50 commented Sep 2, 2024

Trino's default timestamp precision is milliseconds. The timestamp type in Graviton does not handle precision. When using the timestamp type, Trino does not know the precision of the type by default, which may cause problems in reading.

@mchades The timestamp and TimeTypetype in Graviton need to support precision.

@FANNG1
Copy link
Contributor Author

FANNG1 commented Sep 3, 2024

is there other way to resolve this? I'm not sure if this is the right way .

@diqiu50
Copy link
Contributor

diqiu50 commented Sep 3, 2024

We need to first determine what the problem is. The type of timestamp in iceberg is second or millisecond or microsecond.

@FANNG1
Copy link
Contributor Author

FANNG1 commented Sep 3, 2024

Timestamp is transformed to parquet TIMESTAMPTZ_MICROS in https://github.com/apache/iceberg/blob/main/parquet/src/main/java/org/apache/iceberg/parquet/TypeToMessageType.java#L138-L143

      case TIMESTAMP:
        if (((TimestampType) primitive).shouldAdjustToUTC()) {
          return Types.primitive(INT64, repetition).as(TIMESTAMPTZ_MICROS).id(id).named(name);
        } else {
          return Types.primitive(INT64, repetition).as(TIMESTAMP_MICROS).id(id).named(name);
        }

@FANNG1 FANNG1 closed this as completed in b0c4b11 Sep 13, 2024
github-actions bot pushed a commit that referenced this issue Sep 13, 2024
…mestamp in the Iceberg catalog. (#4893)

### What changes were proposed in this pull request?

Fix the default precision of time and timestamp in the Iceberg catalog. 
It causes Trino to be unable to read Iceberg tables with data of time
and timestamp

### Why are the changes needed?

Fix: #4743

### Does this PR introduce _any_ user-facing change?

NO

### How was this patch tested?

New UT, IT
@FANNG1 FANNG1 added the 0.6.1 label Sep 13, 2024
FANNG1 pushed a commit that referenced this issue Sep 14, 2024
…mestamp in the Iceberg catalog. (#4936)

### What changes were proposed in this pull request?

Fix the default precision of time and timestamp in the Iceberg catalog. 
It causes Trino to be unable to read Iceberg tables with data of time
and timestamp

### Why are the changes needed?

Fix: #4743

### Does this PR introduce _any_ user-facing change?

NO

### How was this patch tested?

New UT, IT

Co-authored-by: Yuhui <hui@datastrato.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants