Serialize schema in InputSplit and pull upstream changes in #1557, 1564 #40

HotSushi · 2020-10-12T09:52:47Z

Hive queries that spawn MR jobs will work after this change because:
1) Table schema in RecordReader is now read from the inputsplit
2) HiveSerde on mappers would use input config

Also added a test class which simulates linkedin's way of storing metadata and tests testcases written upstream such as testScanEmptyTable(), testScanTable(), testJoinTables()

cc: @shardulm94

HotSushi · 2020-10-29T22:35:49Z

With changes in #43 : IcebergRecordReader can get correct schema

With changes in #45: HiveIcebergSerde can get correct schema

HotSushi added 3 commits October 12, 2020 02:45

Hive: Fix missing table schema in Hive 1.1 query (#1557)

dd15f00

Hive: Avoid loading catalog to initialized a serde (#1564)

f319a2d

Hive: Serialize table schema in inputsplit

3a968dd

HotSushi closed this Oct 29, 2020

HotSushi deleted the serialize_schema_in_inputsplit branch November 20, 2020 00:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Serialize schema in InputSplit and pull upstream changes in #1557, 1564 #40

Serialize schema in InputSplit and pull upstream changes in #1557, 1564 #40

Uh oh!

HotSushi commented Oct 12, 2020

Uh oh!

HotSushi commented Oct 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Serialize schema in InputSplit and pull upstream changes in #1557, 1564 #40

Serialize schema in InputSplit and pull upstream changes in #1557, 1564 #40

Uh oh!

Conversation

HotSushi commented Oct 12, 2020

Uh oh!

HotSushi commented Oct 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant