Commit c3763d3
Rebase LI-Iceberg changes on top of Apache Iceberg 1.0.0 release
* Hive Catalog: Add a hive catalog that does not override existing Hive metadata (linkedin#10)
Add custom hive catalog to not override existing Hive metadata
Fail early with a proper exception if the metadata file is not existing
Simplify CustomHiveCatalog (linkedin#22)
* Shading: Add a iceberg-runtime shaded module (linkedin#12)
* ORC: Add test for reading files without Iceberg IDs (linkedin#16)
* Hive Metadata Scan: Support reading tables with only Hive metadata (linkedin#23, linkedin#24, linkedin#25, linkedin#26)
- Support for non string partition columns (linkedin#24)
- Support for Hive tables without avro.schema.literal (linkedin#25)
- Hive Metadata Scan: Notify ScanEvent listeners on planning (linkedin#35)
- Hive Metadata Scan: Do not use table snapshot summary for estimating statistics (linkedin#37)
- Hive Metadata Scan: Return empty statistics (linkedin#49)
- Hive Metadata Scan: Do not throw an exception on dangling partitions; log warning message (linkedin#50)
- Hive Metadata Scan: Fix pushdown of non-partition predicates within NOT (linkedin#51)
Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
Co-authored-by: Walaa Eldin Moustafa <wmoustafa@linkedin.com>
* Row level filtering: Allow table scans to pass a row level filter for ORC files
- ORC: Support NameMapping with row-level filtering (linkedin#53)
* Hive: Made Predicate Pushdown dynamic based on the Hive Version
* Hive: Fix uppercase bug and determine catalog from table properties (linkedin#38)
* Hive: Return lowercase fieldname from IcebergRecordStructField
* Hive: Determine catalog from table property
* Hive: Fix schema not forwarded to SerDe on MR jobs (linkedin#45) (linkedin#47)
* Hive: Use Hive table location in HiveIcebergSplit
* Hive: Fix schema not passed to Serde
* Hive: Refactor tests for tables with unqualified location URI
Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>
* Hive Metadata Scan: Support case insensitive name mapping (linkedin#52)
* Hive Metadata Scan: Merge Hive and Avro schemas to fix datatype inconsistencies (linkedin#57)
Hive Metadata Scan: Fix Hive primitive to Avro logical type conversion (linkedin#58)
Hive Metadata Scan: Fix support for Hive timestamp type (linkedin#61)
Co-authored-by: Raymond Zhang <razhang@linkedin.com>
Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>
Fix HasDuplicateLowercaseColumnNames's visit method to use a new visi… (linkedin#67)
* Fix HasDuplicateLowercaseColumnNames's visit method to use a new visitor instance every time
* Trigger CI
(cherry picked from commit b90e838)
* Stop using serdeToFileFormat to unblock formats other than Avro or Orc (linkedin#64)
* Stop using serdeToFileFormat to unblock formats other than Avro or Orc
* Fix style check
* Do not delete metadata location when HMS has been successfully updated (linkedin#68)
(cherry picked from commit 766407e)
* Support reading Avro complex union types (linkedin#73)
Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>
* [#2039] Support default value semantic for AVRO (linkedin#75)
(cherry picked from commit c18f4c4)
* Support hive non string partition cols (linkedin#78)
* Support non-string hive type partition columns in LegacyHiveTableScan
* Leverage eval against partition filter expression to filter non-string columns
* Support default value read for ORC format in spark (linkedin#76)
* Support default value read for ORC format in spark
* Refactor common code for ReadBuilder for both non-vectorized and vectorized read
* Fix code style issue
* Add special handling of ROW_POSITION metadata column
* Add corner case check for partition field
* Use BaseDataReader.convertConstant to convert constants, and expand its functionality to support nested-type contants such as array/map/struct
* Support nested type default value for vectorized read
* Support deeply nested type default value for vectorized read
* Support reading ORC complex union types (linkedin#74)
* Support reading orc complex union types
* add more tests
* support union in VectorizedSparkOrcReaders and improve tests
* support union in VectorizedSparkOrcReaders and improve tests - continued
* fix checkstyle
Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>
* Support avro.schema.literal/hive union types in Hive legacy table to Iceberg conversion (linkedin#80)
* Fix ORC schema visitors to support reading ORC files with deeply nest… (linkedin#81)
* Fix ORC schema visitors to support reading ORC files with deeply nested union type schema
* Added test for vectorized read
* Disable avro validation for default values
Co-authored-by: Shenoda Guirguis <sguirgui@linkedin.com>
* Fix spark avro reader reading union schema data (linkedin#83)
* Fix spark avro reader to read correctly structured nested data values
* Make sure field-id mapping is correctly maintained given arbitrary nested schema that contains union
* Avro: Change union read schema from hive to trino (linkedin#84)
* [LI] Avro: Refactor union-to-struct schema - Part 1. changes to support reading Avro
* ORC: Change union read schema from hive to trino (linkedin#85)
* [LI] ORC: Refactor union-to-struct schema - Part 2. changes to support reading ORC
* Change Hive type to Iceberg type conversion for union
* Recorder hive table properties to align the avro.schema.literal placement contract (linkedin#86)
* [#2039] Support default value semantic for AVRO
(cherry picked from commit c18f4c4)
* reverting commits 2c59857 and f362aed (linkedin#88)
Co-authored-by: Shenoda Guirguis <sguirgui@sguirgui-mn1.linkedin.biz>
* logically patching PR 2328 on HiveMetadataPreservingTableOperations
* Support timestamp as partition type (linkedin#91)
* Support timestamp in partition types
* Address comment
* Separate classes under hive legacy package to new hivelink module (linkedin#87)
* separate class under legacy to new hiveberg module
* fix build
* remove hiveberg dependency in iceberg-spark2 module
* Revert "remove hiveberg dependency in iceberg-spark2 module"
This reverts commit 2e8b743.
* rename hiveberg module to hivelink
Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>
* [LI] Align default value validation align with avro semantics in terms of nullable (nested) fields (linkedin#92)
* Align default value validation align with avro semantics in terms of nullable (nested) fields
* Allow setting null as default value for nested fields in record default
* [LI][Spark][Avro] read avro union using decoder instead of directly returning v… (linkedin#94)
* [LI][Spark] read avro union using decoder instead of directly returning value
* Add a comment for the schema
* Improve the logging when the deserailzed index is invalid to read the symbol from enum (linkedin#96)
* Move custom hive catalog to hivelink-core (linkedin#99)
* Handle non-nullable union of single type for Avro (linkedin#98)
* Handle non-nullable union of single type
Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>
* Handle null default in nested type default value situations (linkedin#100)
* Move 'Hive Metadata Scan: Support case insensitive name mapping' (PR 52) to hivelink-core (linkedin#102)
* Remove activeSparkSession (linkedin#103)
* Disable default value preserving (linkedin#106)
* Disable default value preserving
* [LI][Avro] Do not reorder elements inside a Avro union schema (linkedin#93)
* handle single type union properly in AvroSchemaVisitor for deep nested schema (linkedin#107)
* Handle non-nullable union of single type for ORC spark non-vectorized reader (linkedin#104)
* Handle single type union for non-vectorized reader
* [Avro] Retain the type of field while copying the default values. (linkedin#109)
* Retain the type of field while copying the default values.
* [Hivelink] Refactor support hive non string partition cols to rid of … (linkedin#110)
* [Hivelink] Refactor support hive non string partition cols to rid of Iceberg-oss code changes
* Release automation overhaul: Sonatype Nexus, Shipkit and GH Actions (linkedin#101)
* Add scm and developer info (linkedin#111)
* [Core] Fix and refactor schema parser (linkedin#112)
* [Core] Fix/Refactor SchemaParser to fix multiple bugs
* Enhance the UT for testing required fields with default values (linkedin#113)
* Enhance the UT for testing required fields with default values
* Addressed review comments
* Addressed review comment
* Support single type union for ORC-vectorization reader (linkedin#114)
* Support single type union for ORC-vectorization reader
* Support single type union for ORC-vectorization reader
Co-authored-by: Yiqiang Ding <yiqding@yiqding-mn1.linkedin.biz>
* Refactor HMS code upon cherry-pick
* Check for schema corruption and fix it on commit (linkedin#117)
* Check for schema corruption and fix it on commit
* ORC: Handle query where select and filter only uses default value col… (linkedin#118)
* ORC: Handle query where select and filter only use default value columns
* Set ORC columns and fix case-sensitivity issue with schema check (linkedin#119)
* Hive: Return null for currentSnapshot() (linkedin#121)
* Hive: Return null for currentSnapshot()
* Handle snapshots()
* Fix MergeHiveSchemaWithAvro to make it copy full Avro schema attributes (linkedin#120)
* Fix MergeHiveSchemaWithAvro to make it copy full Avro schema attributes
* Add logic to derive partition column id from partition.column.ids pro… (linkedin#122)
* Add logic to derive partition column id from partition.column.ids property
* Do not push down filter to ORC for union type schema (linkedin#123)
* Bug fix: MergeHiveSchemaWithAvro should retain avro properties for li… (linkedin#125)
* Bug fix: MergeHiveSchemaWithAvro should retain avro properties for list and map when they are nullable
* LinkedIn rebase draft
* Refactor hivelink 1
* Make hivelink module test all pass
* Make spark 2.4 module work
* Fix mr module
* Make spark 3.1 module work
* Fix TestSparkMetadataColumns
* Minor fix for spark 2.4
* Update default spark version to 3.1
* Update java ci to only run spark 2.4 and 3.1
* Minor fix HiveTableOperations
* Adapt github CI to 0.14.x branch
* Fix mr module checkstyle
* Fix checkstyle for orc module
* Fix spark2.4 checkstyle
* Refactor catalog loading logic using CatalogUtil
* Minor change to CI/release
Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>
Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Shardul Mahadik <shardul.m@somaiya.edu>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
Co-authored-by: Walaa Eldin Moustafa <wmoustafa@linkedin.com>
Co-authored-by: Sushant Raikar <sraikar@linkedin.com>
Co-authored-by: ZihanLi58 <48699939+ZihanLi58@users.noreply.github.com>
Co-authored-by: Wenye Zhang <wyzhang@linkedin.com>
Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>
Co-authored-by: Shenoda Guirguis <sguirguis@linkedin.com>
Co-authored-by: Shenoda Guirguis <sguirgui@linkedin.com>
Co-authored-by: Shenoda Guirguis <sguirgui@sguirgui-mn1.linkedin.biz>
Co-authored-by: Lei Sun <lesun@linkedin.com>
Co-authored-by: Jiefan <jiefli@linkedin.com>
Co-authored-by: yiqiangin <103528904+yiqiangin@users.noreply.github.com>
Co-authored-by: Malini Mahalakshmi Venkatachari <maluchari@gmail.com>
Co-authored-by: Yiqiang Ding <yiqding@linkedin.com>
Co-authored-by: Yiqiang Ding <yiqding@yiqding-mn1.linkedin.biz>
Co-authored-by: Jack Moseley <jmoseley@linkedin.com>1 parent e2bb9ad commit c3763d3
File tree
124 files changed
+13979
-367
lines changed- .github/workflows
- api/src
- main/java/org/apache/iceberg/types
- test/java/org/apache/iceberg/types
- core/src
- main/java/org/apache/iceberg
- avro
- mapping
- test/java/org/apache/iceberg
- avro
- data/src/test/java/org/apache/iceberg/data/orc
- format
- hive-metastore/src/main/java/org/apache/iceberg/hive
- hivelink-core/src
- main/java/org/apache/iceberg/hivelink/core
- schema
- utils
- test
- java/org/apache/iceberg/hivelink/core
- resources
- mr/src
- main/java/org/apache/iceberg/mr/hive
- serde/objectinspector
- test/java/org/apache/iceberg/mr/hive
- orc/src/main/java/org/apache/iceberg/orc
- spark
- v2.4
- spark/src
- main/java/org/apache/iceberg/spark
- data
- vectorized
- source
- test/java/org/apache/iceberg/spark/data
- v3.0
- spark/src
- main/java/org/apache/iceberg/spark
- data
- vectorized
- source
- test/java/org/apache/iceberg/spark/data
- v3.1
- spark/src
- main/java/org/apache/iceberg/spark
- data
- vectorized
- source
- test/java/org/apache/iceberg/spark/data
- v3.2/spark/src/main/java/org/apache/iceberg/spark/source
- v3.3
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
124 files changed
+13979
-367
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
| 24 | + | |
| 25 | + | |
28 | 26 | | |
| 27 | + | |
| 28 | + | |
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
56 | | - | |
| 56 | + | |
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | | - | |
| 87 | + | |
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
| |||
94 | 94 | | |
95 | 95 | | |
96 | 96 | | |
97 | | - | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
25 | | - | |
| 24 | + | |
26 | 25 | | |
27 | 26 | | |
28 | 27 | | |
| |||
83 | 82 | | |
84 | 83 | | |
85 | 84 | | |
86 | | - | |
| 85 | + | |
87 | 86 | | |
88 | 87 | | |
89 | 88 | | |
| |||
107 | 106 | | |
108 | 107 | | |
109 | 108 | | |
| 109 | + | |
110 | 110 | | |
111 | 111 | | |
112 | 112 | | |
| |||
Lines changed: 10 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
71 | | - | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
72 | 76 | | |
73 | 77 | | |
74 | 78 | | |
75 | | - | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
76 | 84 | | |
77 | 85 | | |
78 | 86 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
412 | 412 | | |
413 | 413 | | |
414 | 414 | | |
415 | | - | |
| 415 | + | |
416 | 416 | | |
417 | 417 | | |
418 | 418 | | |
419 | | - | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
420 | 425 | | |
421 | 426 | | |
422 | 427 | | |
423 | | - | |
| 428 | + | |
424 | 429 | | |
425 | 430 | | |
426 | 431 | | |
427 | | - | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
428 | 438 | | |
429 | 439 | | |
430 | 440 | | |
431 | | - | |
| 441 | + | |
432 | 442 | | |
433 | 443 | | |
434 | 444 | | |
435 | | - | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
436 | 514 | | |
437 | 515 | | |
438 | 516 | | |
439 | 517 | | |
440 | 518 | | |
441 | 519 | | |
| 520 | + | |
442 | 521 | | |
443 | 522 | | |
444 | | - | |
| 523 | + | |
| 524 | + | |
445 | 525 | | |
446 | 526 | | |
| 527 | + | |
447 | 528 | | |
448 | 529 | | |
449 | 530 | | |
450 | 531 | | |
| 532 | + | |
451 | 533 | | |
452 | 534 | | |
453 | 535 | | |
| |||
459 | 541 | | |
460 | 542 | | |
461 | 543 | | |
462 | | - | |
| 544 | + | |
463 | 545 | | |
464 | 546 | | |
465 | 547 | | |
| |||
470 | 552 | | |
471 | 553 | | |
472 | 554 | | |
473 | | - | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
474 | 564 | | |
475 | 565 | | |
476 | 566 | | |
| |||
492 | 582 | | |
493 | 583 | | |
494 | 584 | | |
| 585 | + | |
495 | 586 | | |
496 | 587 | | |
497 | 588 | | |
| |||
510 | 601 | | |
511 | 602 | | |
512 | 603 | | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
513 | 607 | | |
514 | 608 | | |
515 | 609 | | |
| |||
518 | 612 | | |
519 | 613 | | |
520 | 614 | | |
521 | | - | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
522 | 618 | | |
523 | 619 | | |
524 | 620 | | |
| |||
736 | 832 | | |
737 | 833 | | |
738 | 834 | | |
739 | | - | |
740 | 835 | | |
741 | 836 | | |
742 | 837 | | |
| |||
0 commit comments