The records are not aligned between spark orc reader/writer and generic orc reader/writer.

I tried to write an unit test: it generate few generic `Record` firstly, then write to an orc file1.  Another spark reader will open this file and read it , finally write to another orc file2. 

There seems be some bugs there because the spark reader failed to get the same result with the record reader.  It will throw an exception like this: 

```
Value should match expected: schema.dec_11_2 expected:<623.9> but was:<62.39>
Expected :623.9
Actual   :62.39
<Click to see difference>

java.lang.AssertionError: Value should match expected: schema.dec_11_2 expected:<623.9> but was:<62.39>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:834)
	at org.junit.Assert.assertEquals(Assert.java:118)
	at org.apache.iceberg.spark.data.TestHelpers.assertEquals(TestHelpers.java:631)
	at org.apache.iceberg.spark.data.TestHelpers.assertEquals(TestHelpers.java:641)
	at org.apache.iceberg.spark.data.TestHelpers.assertEquals(TestHelpers.java:612)
	at org.apache.iceberg.spark.data.TestHelpers.assertEquals(TestHelpers.java:599)
	at org.apache.iceberg.spark.data.TestSparkRecordOrcReaderWriter.writeAndValidate(TestSparkRecordOrcReaderWriter.java:86)
	at org.apache.iceberg.spark.data.AvroDataTest.testSimpleStruct(AvroDataTest.java:67)
	at java.lang.Thread.run(Thread.java:748)
```

After checking the iceberg code,  I found that the hive decimal will  decrease its decimal scale by removing its trailing zero (Pls see [here](https://github.com/apache/hive/blob/3419dafd9159f5f2dd2333dd6e816480992954b6/storage-api/src/java/org/apache/hadoop/hive/common/type/FastHiveDecimalImpl.java#L4246) ) while our GenericOrcWriter and SparkOrcWriter did not consider this case,  so we messed up the scale of the decimal. 

The unit test is [here](https://github.com/openinx/incubator-iceberg/commit/ddba0131fe1835efdcced1df5f9af6e4efbc6f2f#diff-e972f9835cce624cc06d5fb07cb2e706R39).

FYI  @rdsr @rdblue @shardulm94 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The records are not aligned between spark orc reader/writer and generic orc reader/writer. #1269

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The records are not aligned between spark orc reader/writer and generic orc reader/writer. #1269

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions