Skip to content

Hudi 1.1 MDT col-stats generation is failing for array and map types. #773

@vinishjail97

Description

@vinishjail97

Search before asking

  • I had searched in the issues and found no similar issues.

Please describe the bug 🐞

There are failures when MDT col-stats are enabled for tables having array/map types in the schema for hudi 1.1

org.apache.hudi.exception.HoodieException: Failed to generate column stats records for metadata table

	at org.apache.hudi.metadata.HoodieTableMetadataUtil.convertMetadataToColumnStatsRecords(HoodieTableMetadataUtil.java:1594)
	at org.apache.hudi.metadata.HoodieMetadataWriteUtils.convertMetadataToRecords(HoodieMetadataWriteUtils.java:387)
	at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter$BatchMetadataConversionFunction.convertMetadata(HoodieBackedTableMetadataWriter.java:1460)
	at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.processAndCommit(HoodieBackedTableMetadataWriter.java:1165)
	at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:1399)
	at org.apache.hudi.client.BaseHoodieClient.writeTableMetadata(BaseHoodieClient.java:285)
	at org.apache.hudi.client.BaseHoodieWriteClient.writeToMetadataTable(BaseHoodieWriteClient.java:339)
	at org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:320)
	at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:276)
	at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:244)
	at org.apache.hudi.client.HoodieJavaWriteClient.commit(HoodieJavaWriteClient.java:97)
	at org.apache.hudi.client.HoodieJavaWriteClient.commit(HoodieJavaWriteClient.java:52)
	at org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:226)
	at org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:221)
	at org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:211)
	at org.apache.xtable.TestJavaHudiTable.insertRecordsWithCommitAlreadyStarted(TestJavaHudiTable.java:198)
	at org.apache.xtable.TestAbstractHudiTable.insertRecords(TestAbstractHudiTable.java:272)
	at org.apache.xtable.hudi.TestHudiFileStatsExtractor.columnStatsWithMetadataTable(TestHudiFileStatsExtractor.java:145)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Caused by: org.apache.avro.AvroRuntimeException: Not a record: {"type":"map","values":{"type":"record","name":"Nested","namespace":"test.nested_record","fields":[{"name":"nested_int","type":"int","default":0}]}}
	at org.apache.avro.Schema.getField(Schema.java:275)
	at org.apache.hudi.avro.HoodieAvroUtils.getSchemaForField(HoodieAvroUtils.java:1652)
	at org.apache.hudi.avro.HoodieAvroUtils.getSchemaForField(HoodieAvroUtils.java:1656)
	at org.apache.hudi.avro.HoodieAvroUtils.getSchemaForField(HoodieAvroUtils.java:1642)
	at org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$getColumnsToIndexWithoutRequiredMetaFields$48(HoodieTableMetadataUtil.java:1696)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
	at org.apache.hudi.metadata.HoodieTableMetadataUtil.getColumnsToIndexWithoutRequiredMetaFields(HoodieTableMetadataUtil.java:1698)
	at org.apache.hudi.metadata.HoodieTableMetadataUtil.getColumnsToIndex(HoodieTableMetadataUtil.java:1655)
	at org.apache.hudi.metadata.HoodieTableMetadataUtil.getColumnsToIndex(HoodieTableMetadataUtil.java:1615)
	at org.apache.hudi.metadata.HoodieTableMetadataUtil.convertMetadataToColumnStatsRecords(HoodieTableMetadataUtil.java:1583)
	... 24 more

Are you willing to submit PR?

  • I am willing to submit a PR!
  • I am willing to submit a PR but need help getting started!

Code of Conduct

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions