Skip to content

Conversation

@shardulm94
Copy link
Contributor

@shardulm94 shardulm94 commented Feb 26, 2020

  • Add LegacyHiveCatalog which tries to load a table using Hive metadata to get the files belonging to a table
  • LegacyHiveTableOperations should be the main class of interest which derives Iceberg table metadata from Hive metadata at runtime
  • LegacyHiveTableScan will generate the list of files relevant to the scan using Hive's listPartitionsByFilter (for partitioned tables)
  • HiveExpressions simplifies the partition filter before passing to the Hive metastore since metastore only supports a small subset of predicates for listPartitionsByFilter. HiveExpressions also contains code to generate a filter string from the simplified expression which Hive can understand.

@shardulm94 shardulm94 requested a review from rdsr February 26, 2020 06:29
@shardulm94 shardulm94 requested a review from wmoustafa March 4, 2020 20:11
@rdsr
Copy link
Contributor

rdsr commented Mar 8, 2020

Did a high level pass. Will continue to review more in depth


package org.apache.iceberg.hive.legacy;

import java.io.IOException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already a fs util in Icceberg. Move to that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which class are you referring to? Could not find it.

* available. If the table is read through Hive metadata, features like time travel, snapshot isolation and incremental
* computation are not supported along with any WRITE operations to either the data or metadata.
*/
public class HiveCatalogWithLegacyReadFallback extends HiveCatalog {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legacy is a bit vague. I know it is hard to come up with a descriptive one-word term. How about "HiveCatalogWithHiveReadFallback", "HiveCatalogWithHiveOpsFallback", "HiveCatalogWithFallback", "HiveCatalogWithDirectoryFallback", "HiveCatalogWithDirectoryListingFallback"? No strong opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is with Legacy, but I leave it up to @shardulm94

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legacy is a loaded word. HiveCatalogWithReadFallback is just as descriptive and takes up less screen real estate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed today over call we will simply be renaming it to LegacyHiveCatalog and let caller handle fallbacks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point was about the word "Legacy", and the fact that LegacyHiveCatalog does not tell whether Legacy refers to Hive or the Catalog. One who is not familiar with the intention will interpret the Catalog is Legacy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will keep it as LegacyHiveCatalog till we come up with a term that everyone is satisfied with. The usage of this catalog is internal and we have enough Javadoc to clarify its intention. It is also really Legacy as once Iceberg metadata is rolled out, this will no longer be needed.

// If simplifyPartitionFilter returns TRUE, there are no filters on partition columns or the filter expression is
// going to match all partitions
if (simplified.op() == Expression.Operation.TRUE) {
partitions = metaClients.run(client -> client.listPartitionsByFilter(database, tableName, null, (short) -1));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would TRUE throw exception (as seen in simplification code), or list all partitions (as seen here)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TRUE will list all partitions. TRUE does not throw exception in simplification code, it throws exception during conversion to Hive filter string since Hive filter string does not support TRUE literal.

public Table loadTable(TableIdentifier identifier) {
// Try to load the table using Iceberg metadata first. If it fails, use Hive metadata
try {
return super.loadTable(identifier);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make sure if the Iceberg table metadata is corrupted, it does not try to load the legacy Hive table, but should throw an exception

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think people who are trying to access a table will be upset if they find out that Iceberg ignores Hive metadata even when its own metadata is corrupted. Here are two alternatives for how to handle this situation:

  1. Log an error and then fallback to using Hive metadata.
  2. Provide a configuration option that determines whether or not Iceberg should fallback to using Hive metadata. If fallback happens, log a WARNing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrupt metadata signals that the table has a problem. It is meant to be read as Iceberg table . This is an exceptional condition for which we should throw an exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some discussion today, we decided to leave the decision of falling back to the caller (i.e. Dali in LI environment). The catalog now only reads the table using Hive metadata. The caller is expected to check whether the table is a valid Iceberg table or not and invoke the correct catalog. We decided to do this as we also saw the need to support other table formats (i.e. Opal), so this if-else would not be sufficient anyways.

@rdsr
Copy link
Contributor

rdsr commented Mar 10, 2020

@shardulm94 do u think u can add some tests early. It is easier to play around with then.

* available. If the table is read through Hive metadata, features like time travel, snapshot isolation and incremental
* computation are not supported along with any WRITE operations to either the data or metadata.
*/
public class HiveCatalogWithLegacyReadFallback extends HiveCatalog {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is with Legacy, but I leave it up to @shardulm94

Comment on lines 124 to 125
if (leftResult == null && rightResult == null) {
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@rdsr
Copy link
Contributor

rdsr commented Mar 13, 2020

@shardulm94. Looks good! Only Minor comments. Once sufficient tests are added. We are good to check this in

Copy link
Contributor

@rdsr rdsr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 . LGTM

@rdsr
Copy link
Contributor

rdsr commented Mar 17, 2020

Will merge after a day after other reviewers have a chance to look at it.

@shardulm94 shardulm94 changed the title Support reading tables with Hive metadata if Iceberg metadata is not available Support reading tables with only Hive metadata Mar 17, 2020
Comment on lines 102 to 103
if (leftResult == null && rightResult == null) {
return null;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the consequences of substituting non-partition predicates with alwaysTrue()? Seems it will simplify the code?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it may clear confusions about the semantics of null.

Copy link
Contributor Author

@shardulm94 shardulm94 Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, seems like this can be simplified. I had kept it null previously to distinguish actual TRUE expressions passed by user v/s ones added by visitor. But it makes sense to just substitute non-partition predicates with alwaysTrue().

Done

Copy link

@wmoustafa wmoustafa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the insightful discussions throughout the PR.

@rdsr rdsr merged commit 1a9acf8 into linkedin:master Mar 18, 2020
shardulm94 added a commit to shardulm94/linkedin-iceberg that referenced this pull request Apr 3, 2020
…inkedin#23)

* Support reading tables with Hive metadata if Iceberg metadata is not available
shardulm94 added a commit to shardulm94/linkedin-iceberg that referenced this pull request May 14, 2020
…inkedin#23)

* Support reading tables with Hive metadata if Iceberg metadata is not available
shardulm94 added a commit to shardulm94/linkedin-iceberg that referenced this pull request May 27, 2020
…inkedin#23)

* Support reading tables with Hive metadata if Iceberg metadata is not available
shardulm94 added a commit that referenced this pull request May 28, 2020
* Support reading tables with Hive metadata if Iceberg metadata is not available
shardulm94 added a commit that referenced this pull request May 31, 2020
…, #24, #25, #26)

- Support for non string partition columns (#24)
- Support for Hive tables without avro.schema.literal (#25)

Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
shardulm94 added a commit to shardulm94/linkedin-iceberg that referenced this pull request Jun 28, 2020
…inkedin#23, linkedin#24, linkedin#25, linkedin#26)

- Support for non string partition columns (linkedin#24)
- Support for Hive tables without avro.schema.literal (linkedin#25)

Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
shardulm94 added a commit to shardulm94/linkedin-iceberg that referenced this pull request Jul 16, 2020
…inkedin#23, linkedin#24, linkedin#25, linkedin#26)

- Support for non string partition columns (linkedin#24)
- Support for Hive tables without avro.schema.literal (linkedin#25)

Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
HotSushi pushed a commit to HotSushi/iceberg-test-artifacts that referenced this pull request Jul 31, 2020
…inkedin#23, linkedin#24, linkedin#25, linkedin#26)

- Support for non string partition columns (linkedin#24)
- Support for Hive tables without avro.schema.literal (linkedin#25)

Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
shardulm94 added a commit to shardulm94/linkedin-iceberg that referenced this pull request Jul 31, 2020
…inkedin#23, linkedin#24, linkedin#25, linkedin#26)

- Support for non string partition columns (linkedin#24)
- Support for Hive tables without avro.schema.literal (linkedin#25)

Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
shardulm94 added a commit to shardulm94/linkedin-iceberg that referenced this pull request Jul 31, 2020
…inkedin#23, linkedin#24, linkedin#25, linkedin#26)

- Support for non string partition columns (linkedin#24)
- Support for Hive tables without avro.schema.literal (linkedin#25)

Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
rdsr added a commit that referenced this pull request Aug 18, 2020
…, #24, #25, #26)

- Support for non string partition columns (#24)
- Support for Hive tables without avro.schema.literal (#25)

Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
shardulm94 added a commit to shardulm94/linkedin-iceberg that referenced this pull request Oct 16, 2020
…inkedin#23, linkedin#24, linkedin#25, linkedin#26)

- Support for non string partition columns (linkedin#24)
- Support for Hive tables without avro.schema.literal (linkedin#25)

Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>

Hive Metadata Scan: Notify ScanEvent listeners on planning (linkedin#35)
Hive Metadata Scan: Do not use table snapshot summary for estimating statistics (linkedin#37)
shardulm94 added a commit to shardulm94/linkedin-iceberg that referenced this pull request Nov 18, 2020
…inkedin#23, linkedin#24, linkedin#25, linkedin#26)

- Support for non string partition columns (linkedin#24)
- Support for Hive tables without avro.schema.literal (linkedin#25)

Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>

Hive Metadata Scan: Notify ScanEvent listeners on planning (linkedin#35)
Hive Metadata Scan: Do not use table snapshot summary for estimating statistics (linkedin#37)
shardulm94 added a commit that referenced this pull request Nov 18, 2020
…, #24, #25, #26)

- Support for non string partition columns (#24)
- Support for Hive tables without avro.schema.literal (#25)

Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>

Hive Metadata Scan: Notify ScanEvent listeners on planning (#35)
Hive Metadata Scan: Do not use table snapshot summary for estimating statistics (#37)
shardulm94 added a commit to shardulm94/linkedin-iceberg that referenced this pull request Jan 9, 2021
…inkedin#23, linkedin#24, linkedin#25, linkedin#26)

- Support for non string partition columns (linkedin#24)
- Support for Hive tables without avro.schema.literal (linkedin#25)

Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>

Hive Metadata Scan: Notify ScanEvent listeners on planning (linkedin#35)
Hive Metadata Scan: Do not use table snapshot summary for estimating statistics (linkedin#37)
shardulm94 added a commit to shardulm94/linkedin-iceberg that referenced this pull request Jan 9, 2021
…inkedin#23, linkedin#24, linkedin#25, linkedin#26)

- Support for non string partition columns (linkedin#24)
- Support for Hive tables without avro.schema.literal (linkedin#25)
- Hive Metadata Scan: Notify ScanEvent listeners on planning (linkedin#35)
- Hive Metadata Scan: Do not use table snapshot summary for estimating statistics (linkedin#37)
- Hive Metadata Scan: Return empty statistics (linkedin#49)
- Hive Metadata Scan: Do not throw an exception on dangling partitions; log warning message (linkedin#50)

Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
Co-authored-by: Walaa Eldin Moustafa <wmoustafa@linkedin.com>
shardulm94 added a commit to shardulm94/linkedin-iceberg that referenced this pull request Jan 27, 2021
…inkedin#23, linkedin#24, linkedin#25, linkedin#26)

- Support for non string partition columns (linkedin#24)
- Support for Hive tables without avro.schema.literal (linkedin#25)
- Hive Metadata Scan: Notify ScanEvent listeners on planning (linkedin#35)
- Hive Metadata Scan: Do not use table snapshot summary for estimating statistics (linkedin#37)
- Hive Metadata Scan: Return empty statistics (linkedin#49)
- Hive Metadata Scan: Do not throw an exception on dangling partitions; log warning message (linkedin#50)

Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
Co-authored-by: Walaa Eldin Moustafa <wmoustafa@linkedin.com>
shardulm94 added a commit to shardulm94/linkedin-iceberg that referenced this pull request Feb 22, 2021
…inkedin#23, linkedin#24, linkedin#25, linkedin#26)

- Support for non string partition columns (linkedin#24)
- Support for Hive tables without avro.schema.literal (linkedin#25)
- Hive Metadata Scan: Notify ScanEvent listeners on planning (linkedin#35)
- Hive Metadata Scan: Do not use table snapshot summary for estimating statistics (linkedin#37)
- Hive Metadata Scan: Return empty statistics (linkedin#49)
- Hive Metadata Scan: Do not throw an exception on dangling partitions; log warning message (linkedin#50)
- Hive Metadata Scan: Fix pushdown of non-partition predicates within NOT (linkedin#51)

Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
Co-authored-by: Walaa Eldin Moustafa <wmoustafa@linkedin.com>
rzhang10 pushed a commit to rzhang10/iceberg that referenced this pull request Oct 11, 2022
…inkedin#23, linkedin#24, linkedin#25, linkedin#26)

- Support for non string partition columns (linkedin#24)
- Support for Hive tables without avro.schema.literal (linkedin#25)
- Hive Metadata Scan: Notify ScanEvent listeners on planning (linkedin#35)
- Hive Metadata Scan: Do not use table snapshot summary for estimating statistics (linkedin#37)
- Hive Metadata Scan: Return empty statistics (linkedin#49)
- Hive Metadata Scan: Do not throw an exception on dangling partitions; log warning message (linkedin#50)
- Hive Metadata Scan: Fix pushdown of non-partition predicates within NOT (linkedin#51)

Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
Co-authored-by: Walaa Eldin Moustafa <wmoustafa@linkedin.com>
rzhang10 added a commit that referenced this pull request Oct 21, 2022
* Hive Catalog: Add a hive catalog that does not override existing Hive metadata (#10)

Add custom hive catalog to not override existing Hive metadata

Fail early with a proper exception if the metadata file is not existing

Simplify CustomHiveCatalog (#22)

* Shading: Add a iceberg-runtime shaded module (#12)

* ORC: Add test for reading files without Iceberg IDs (#16)

* Hive Metadata Scan: Support reading tables with only Hive metadata (#23, #24, #25, #26)

- Support for non string partition columns (#24)
- Support for Hive tables without avro.schema.literal (#25)
- Hive Metadata Scan: Notify ScanEvent listeners on planning (#35)
- Hive Metadata Scan: Do not use table snapshot summary for estimating statistics (#37)
- Hive Metadata Scan: Return empty statistics (#49)
- Hive Metadata Scan: Do not throw an exception on dangling partitions; log warning message (#50)
- Hive Metadata Scan: Fix pushdown of non-partition predicates within NOT (#51)

Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
Co-authored-by: Walaa Eldin Moustafa <wmoustafa@linkedin.com>

* Row level filtering: Allow table scans to pass a row level filter for ORC files

- ORC: Support NameMapping with row-level filtering (#53)

* Hive: Made Predicate Pushdown dynamic based on the Hive Version

* Hive: Fix uppercase bug and determine catalog from table properties (#38)

* Hive: Return lowercase fieldname from IcebergRecordStructField
* Hive: Determine catalog from table property

* Hive: Fix schema not forwarded to SerDe on MR jobs (#45) (#47)

* Hive: Use Hive table location in HiveIcebergSplit
* Hive: Fix schema not passed to Serde
* Hive: Refactor tests for tables with unqualified location URI

Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>

* Hive Metadata Scan: Support case insensitive name mapping (#52)

* Hive Metadata Scan: Merge Hive and Avro schemas to fix datatype inconsistencies (#57)

Hive Metadata Scan: Fix Hive primitive to Avro logical type conversion (#58)

Hive Metadata Scan: Fix support for Hive timestamp type (#61)

Co-authored-by: Raymond Zhang <razhang@linkedin.com>
Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>

Fix HasDuplicateLowercaseColumnNames's visit method to use a new visi… (#67)

* Fix HasDuplicateLowercaseColumnNames's visit method to use a new visitor instance every time

* Trigger CI

(cherry picked from commit b90e838)

* Stop using serdeToFileFormat to unblock formats other than Avro or Orc (#64)

* Stop using serdeToFileFormat to unblock formats other than Avro or Orc

* Fix style check

* Do not delete metadata location when HMS has been successfully updated (#68)

(cherry picked from commit 766407e)

* Support reading Avro complex union types (#73)

Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>

* [#2039] Support default value semantic for AVRO (#75)

(cherry picked from commit c18f4c4)

* Support hive non string partition cols (#78)

* Support non-string hive type partition columns in LegacyHiveTableScan

* Leverage eval against partition filter expression to filter non-string columns

* Support default value read for ORC format in spark (#76)

* Support default value read for ORC format in spark

* Refactor common code for ReadBuilder for both non-vectorized and vectorized read

* Fix code style issue

* Add special handling of ROW_POSITION metadata column

* Add corner case check for partition field

* Use BaseDataReader.convertConstant to convert constants, and expand its functionality to support nested-type contants such as array/map/struct

* Support nested type default value for vectorized read

* Support deeply nested type default value for vectorized read

* Support reading ORC complex union types (#74)

* Support reading orc complex union types

* add more tests

* support union in VectorizedSparkOrcReaders and improve tests

* support union in VectorizedSparkOrcReaders and improve tests - continued

* fix checkstyle

Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>

* Support avro.schema.literal/hive union types in Hive legacy table to Iceberg conversion (#80)

* Fix ORC schema visitors to support reading ORC files with deeply nest… (#81)

* Fix ORC schema visitors to support reading ORC files with deeply nested union type schema

* Added test for vectorized read

* Disable avro validation for default values

Co-authored-by: Shenoda Guirguis <sguirgui@linkedin.com>

* Fix spark avro reader reading union schema data (#83)

* Fix spark avro reader to read correctly structured nested data values

* Make sure field-id mapping is correctly maintained given arbitrary nested schema that contains union

* Avro: Change union read schema from hive to trino (#84)

* [LI] Avro: Refactor union-to-struct schema - Part 1. changes to support reading Avro

* ORC: Change union read schema from hive to trino (#85)

* [LI] ORC: Refactor union-to-struct schema - Part 2. changes to support reading ORC

* Change Hive type to Iceberg type conversion for union

* Recorder hive table properties to align the avro.schema.literal placement contract (#86)

* [#2039] Support default value semantic for AVRO

(cherry picked from commit c18f4c4)

* reverting commits 2c59857 and f362aed (#88)

Co-authored-by: Shenoda Guirguis <sguirgui@sguirgui-mn1.linkedin.biz>

* logically patching PR 2328 on HiveMetadataPreservingTableOperations

* Support timestamp as partition type (#91)

* Support timestamp in partition types

* Address comment

* Separate classes under hive legacy package to new hivelink module (#87)

* separate class under legacy to new hiveberg module

* fix build

* remove hiveberg dependency in iceberg-spark2 module

* Revert "remove hiveberg dependency in iceberg-spark2 module"

This reverts commit 2e8b743.

* rename hiveberg module to hivelink

Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>

* [LI] Align default value validation align with avro semantics in terms of nullable (nested) fields (#92)

* Align default value validation align with avro semantics in terms of nullable (nested) fields

* Allow setting null as default value for nested fields in record default

* [LI][Spark][Avro] read avro union using decoder instead of directly returning v… (#94)

* [LI][Spark] read avro union using decoder instead of directly returning value

* Add a comment for the schema

* Improve the logging when the deserailzed index is invalid to read the symbol from enum (#96)

* Move custom hive catalog to hivelink-core (#99)

* Handle non-nullable union of single type for Avro (#98)

* Handle non-nullable union of single type

Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>

* Handle null default in nested type default value situations (#100)

* Move 'Hive Metadata Scan: Support case insensitive name mapping' (PR 52) to hivelink-core (#102)

* Remove activeSparkSession (#103)

* Disable default value preserving (#106)

* Disable default value preserving

* [LI][Avro] Do not reorder elements inside a Avro union schema (#93)

* handle single type union properly in AvroSchemaVisitor for deep nested schema (#107)

* Handle non-nullable union of single type for ORC spark non-vectorized reader (#104)

* Handle single type union for non-vectorized reader

* [Avro] Retain the type of field while copying the default values. (#109)

* Retain the type of field while copying the default values.

* [Hivelink] Refactor support hive non string partition cols to rid of … (#110)

* [Hivelink] Refactor support hive non string partition cols to rid of Iceberg-oss code changes

* Release automation overhaul: Sonatype Nexus, Shipkit and GH Actions (#101)

* Add scm and developer info (#111)

* [Core] Fix and refactor schema parser (#112)

* [Core] Fix/Refactor SchemaParser to fix multiple bugs

* Enhance the UT for testing required fields with default values (#113)

* Enhance the UT for testing required fields with default values

* Addressed review comments

* Addressed review comment

* Support single type union for ORC-vectorization reader (#114)

* Support single type union for ORC-vectorization reader

* Support single type union for ORC-vectorization reader

Co-authored-by: Yiqiang Ding <yiqding@yiqding-mn1.linkedin.biz>

* Refactor HMS code upon cherry-pick

* Check for schema corruption and fix it on commit (#117)

* Check for schema corruption and fix it on commit

* ORC: Handle query where select and filter only uses default value col… (#118)

* ORC: Handle query where select and filter only use default value columns

* Set ORC columns and fix case-sensitivity issue with schema check (#119)

* Hive: Return null for currentSnapshot() (#121)

* Hive: Return null for currentSnapshot()

* Handle snapshots()

* Fix MergeHiveSchemaWithAvro to make it copy full Avro schema attributes (#120)

* Fix MergeHiveSchemaWithAvro to make it copy full Avro schema attributes

* Add logic to derive partition column id from partition.column.ids pro… (#122)

* Add logic to derive partition column id from partition.column.ids property

* Do not push down filter to ORC for union type schema (#123)

* Bug fix: MergeHiveSchemaWithAvro should retain avro properties for li… (#125)

* Bug fix: MergeHiveSchemaWithAvro should retain avro properties for list and map when they are nullable

* LinkedIn rebase draft

* Refactor hivelink 1

* Make hivelink module test all pass

* Make spark 2.4 module work

* Fix mr module

* Make spark 3.1 module work

* Fix TestSparkMetadataColumns

* Minor fix for spark 2.4

* Update default spark version to 3.1

* Update java ci to only run spark 2.4 and 3.1

* Minor fix HiveTableOperations

* Adapt github CI to 0.14.x branch

* Fix mr module checkstyle

* Fix checkstyle for orc module

* Fix spark2.4 checkstyle

* Refactor catalog loading logic using CatalogUtil

* Minor change to CI/release

Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>
Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Shardul Mahadik <shardul.m@somaiya.edu>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
Co-authored-by: Walaa Eldin Moustafa <wmoustafa@linkedin.com>
Co-authored-by: Sushant Raikar <sraikar@linkedin.com>
Co-authored-by: ZihanLi58 <48699939+ZihanLi58@users.noreply.github.com>
Co-authored-by: Wenye Zhang <wyzhang@linkedin.com>
Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>
Co-authored-by: Shenoda Guirguis <sguirguis@linkedin.com>
Co-authored-by: Shenoda Guirguis <sguirgui@linkedin.com>
Co-authored-by: Shenoda Guirguis <sguirgui@sguirgui-mn1.linkedin.biz>
Co-authored-by: Lei Sun <lesun@linkedin.com>
Co-authored-by: Jiefan <jiefli@linkedin.com>
Co-authored-by: yiqiangin <103528904+yiqiangin@users.noreply.github.com>
Co-authored-by: Malini Mahalakshmi Venkatachari <maluchari@gmail.com>
Co-authored-by: Yiqiang Ding <yiqding@linkedin.com>
Co-authored-by: Yiqiang Ding <yiqding@yiqding-mn1.linkedin.biz>
Co-authored-by: Jack Moseley <jmoseley@linkedin.com>
rzhang10 added a commit to rzhang10/iceberg that referenced this pull request Nov 4, 2022
* Hive Catalog: Add a hive catalog that does not override existing Hive metadata (linkedin#10)

Add custom hive catalog to not override existing Hive metadata

Fail early with a proper exception if the metadata file is not existing

Simplify CustomHiveCatalog (linkedin#22)

* Shading: Add a iceberg-runtime shaded module (linkedin#12)

* ORC: Add test for reading files without Iceberg IDs (linkedin#16)

* Hive Metadata Scan: Support reading tables with only Hive metadata (linkedin#23, linkedin#24, linkedin#25, linkedin#26)

- Support for non string partition columns (linkedin#24)
- Support for Hive tables without avro.schema.literal (linkedin#25)
- Hive Metadata Scan: Notify ScanEvent listeners on planning (linkedin#35)
- Hive Metadata Scan: Do not use table snapshot summary for estimating statistics (linkedin#37)
- Hive Metadata Scan: Return empty statistics (linkedin#49)
- Hive Metadata Scan: Do not throw an exception on dangling partitions; log warning message (linkedin#50)
- Hive Metadata Scan: Fix pushdown of non-partition predicates within NOT (linkedin#51)

Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
Co-authored-by: Walaa Eldin Moustafa <wmoustafa@linkedin.com>

* Row level filtering: Allow table scans to pass a row level filter for ORC files

- ORC: Support NameMapping with row-level filtering (linkedin#53)

* Hive: Made Predicate Pushdown dynamic based on the Hive Version

* Hive: Fix uppercase bug and determine catalog from table properties (linkedin#38)

* Hive: Return lowercase fieldname from IcebergRecordStructField
* Hive: Determine catalog from table property

* Hive: Fix schema not forwarded to SerDe on MR jobs (linkedin#45) (linkedin#47)

* Hive: Use Hive table location in HiveIcebergSplit
* Hive: Fix schema not passed to Serde
* Hive: Refactor tests for tables with unqualified location URI

Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>

* Hive Metadata Scan: Support case insensitive name mapping (linkedin#52)

* Hive Metadata Scan: Merge Hive and Avro schemas to fix datatype inconsistencies (linkedin#57)

Hive Metadata Scan: Fix Hive primitive to Avro logical type conversion (linkedin#58)

Hive Metadata Scan: Fix support for Hive timestamp type (linkedin#61)

Co-authored-by: Raymond Zhang <razhang@linkedin.com>
Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>

Fix HasDuplicateLowercaseColumnNames's visit method to use a new visi… (linkedin#67)

* Fix HasDuplicateLowercaseColumnNames's visit method to use a new visitor instance every time

* Trigger CI

(cherry picked from commit b90e838)

* Stop using serdeToFileFormat to unblock formats other than Avro or Orc (linkedin#64)

* Stop using serdeToFileFormat to unblock formats other than Avro or Orc

* Fix style check

* Do not delete metadata location when HMS has been successfully updated (linkedin#68)

(cherry picked from commit 766407e)

* Support reading Avro complex union types (linkedin#73)

Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>

* [#2039] Support default value semantic for AVRO (linkedin#75)

(cherry picked from commit c18f4c4)

* Support hive non string partition cols (linkedin#78)

* Support non-string hive type partition columns in LegacyHiveTableScan

* Leverage eval against partition filter expression to filter non-string columns

* Support default value read for ORC format in spark (linkedin#76)

* Support default value read for ORC format in spark

* Refactor common code for ReadBuilder for both non-vectorized and vectorized read

* Fix code style issue

* Add special handling of ROW_POSITION metadata column

* Add corner case check for partition field

* Use BaseDataReader.convertConstant to convert constants, and expand its functionality to support nested-type contants such as array/map/struct

* Support nested type default value for vectorized read

* Support deeply nested type default value for vectorized read

* Support reading ORC complex union types (linkedin#74)

* Support reading orc complex union types

* add more tests

* support union in VectorizedSparkOrcReaders and improve tests

* support union in VectorizedSparkOrcReaders and improve tests - continued

* fix checkstyle

Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>

* Support avro.schema.literal/hive union types in Hive legacy table to Iceberg conversion (linkedin#80)

* Fix ORC schema visitors to support reading ORC files with deeply nest… (linkedin#81)

* Fix ORC schema visitors to support reading ORC files with deeply nested union type schema

* Added test for vectorized read

* Disable avro validation for default values

Co-authored-by: Shenoda Guirguis <sguirgui@linkedin.com>

* Fix spark avro reader reading union schema data (linkedin#83)

* Fix spark avro reader to read correctly structured nested data values

* Make sure field-id mapping is correctly maintained given arbitrary nested schema that contains union

* Avro: Change union read schema from hive to trino (linkedin#84)

* [LI] Avro: Refactor union-to-struct schema - Part 1. changes to support reading Avro

* ORC: Change union read schema from hive to trino (linkedin#85)

* [LI] ORC: Refactor union-to-struct schema - Part 2. changes to support reading ORC

* Change Hive type to Iceberg type conversion for union

* Recorder hive table properties to align the avro.schema.literal placement contract (linkedin#86)

* [#2039] Support default value semantic for AVRO

(cherry picked from commit c18f4c4)

* reverting commits 2c59857 and f362aed (linkedin#88)

Co-authored-by: Shenoda Guirguis <sguirgui@sguirgui-mn1.linkedin.biz>

* logically patching PR 2328 on HiveMetadataPreservingTableOperations

* Support timestamp as partition type (linkedin#91)

* Support timestamp in partition types

* Address comment

* Separate classes under hive legacy package to new hivelink module (linkedin#87)

* separate class under legacy to new hiveberg module

* fix build

* remove hiveberg dependency in iceberg-spark2 module

* Revert "remove hiveberg dependency in iceberg-spark2 module"

This reverts commit 2e8b743.

* rename hiveberg module to hivelink

Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>

* [LI] Align default value validation align with avro semantics in terms of nullable (nested) fields (linkedin#92)

* Align default value validation align with avro semantics in terms of nullable (nested) fields

* Allow setting null as default value for nested fields in record default

* [LI][Spark][Avro] read avro union using decoder instead of directly returning v… (linkedin#94)

* [LI][Spark] read avro union using decoder instead of directly returning value

* Add a comment for the schema

* Improve the logging when the deserailzed index is invalid to read the symbol from enum (linkedin#96)

* Move custom hive catalog to hivelink-core (linkedin#99)

* Handle non-nullable union of single type for Avro (linkedin#98)

* Handle non-nullable union of single type

Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>

* Handle null default in nested type default value situations (linkedin#100)

* Move 'Hive Metadata Scan: Support case insensitive name mapping' (PR 52) to hivelink-core (linkedin#102)

* Remove activeSparkSession (linkedin#103)

* Disable default value preserving (linkedin#106)

* Disable default value preserving

* [LI][Avro] Do not reorder elements inside a Avro union schema (linkedin#93)

* handle single type union properly in AvroSchemaVisitor for deep nested schema (linkedin#107)

* Handle non-nullable union of single type for ORC spark non-vectorized reader (linkedin#104)

* Handle single type union for non-vectorized reader

* [Avro] Retain the type of field while copying the default values. (linkedin#109)

* Retain the type of field while copying the default values.

* [Hivelink] Refactor support hive non string partition cols to rid of … (linkedin#110)

* [Hivelink] Refactor support hive non string partition cols to rid of Iceberg-oss code changes

* Release automation overhaul: Sonatype Nexus, Shipkit and GH Actions (linkedin#101)

* Add scm and developer info (linkedin#111)

* [Core] Fix and refactor schema parser (linkedin#112)

* [Core] Fix/Refactor SchemaParser to fix multiple bugs

* Enhance the UT for testing required fields with default values (linkedin#113)

* Enhance the UT for testing required fields with default values

* Addressed review comments

* Addressed review comment

* Support single type union for ORC-vectorization reader (linkedin#114)

* Support single type union for ORC-vectorization reader

* Support single type union for ORC-vectorization reader

Co-authored-by: Yiqiang Ding <yiqding@yiqding-mn1.linkedin.biz>

* Refactor HMS code upon cherry-pick

* Check for schema corruption and fix it on commit (linkedin#117)

* Check for schema corruption and fix it on commit

* ORC: Handle query where select and filter only uses default value col… (linkedin#118)

* ORC: Handle query where select and filter only use default value columns

* Set ORC columns and fix case-sensitivity issue with schema check (linkedin#119)

* Hive: Return null for currentSnapshot() (linkedin#121)

* Hive: Return null for currentSnapshot()

* Handle snapshots()

* Fix MergeHiveSchemaWithAvro to make it copy full Avro schema attributes (linkedin#120)

* Fix MergeHiveSchemaWithAvro to make it copy full Avro schema attributes

* Add logic to derive partition column id from partition.column.ids pro… (linkedin#122)

* Add logic to derive partition column id from partition.column.ids property

* Do not push down filter to ORC for union type schema (linkedin#123)

* Bug fix: MergeHiveSchemaWithAvro should retain avro properties for li… (linkedin#125)

* Bug fix: MergeHiveSchemaWithAvro should retain avro properties for list and map when they are nullable

* LinkedIn rebase draft

* Refactor hivelink 1

* Make hivelink module test all pass

* Make spark 2.4 module work

* Fix mr module

* Make spark 3.1 module work

* Fix TestSparkMetadataColumns

* Minor fix for spark 2.4

* Update default spark version to 3.1

* Update java ci to only run spark 2.4 and 3.1

* Minor fix HiveTableOperations

* Adapt github CI to 0.14.x branch

* Fix mr module checkstyle

* Fix checkstyle for orc module

* Fix spark2.4 checkstyle

* Refactor catalog loading logic using CatalogUtil

* Minor change to CI/release

Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>
Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Shardul Mahadik <shardul.m@somaiya.edu>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
Co-authored-by: Walaa Eldin Moustafa <wmoustafa@linkedin.com>
Co-authored-by: Sushant Raikar <sraikar@linkedin.com>
Co-authored-by: ZihanLi58 <48699939+ZihanLi58@users.noreply.github.com>
Co-authored-by: Wenye Zhang <wyzhang@linkedin.com>
Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>
Co-authored-by: Shenoda Guirguis <sguirguis@linkedin.com>
Co-authored-by: Shenoda Guirguis <sguirgui@linkedin.com>
Co-authored-by: Shenoda Guirguis <sguirgui@sguirgui-mn1.linkedin.biz>
Co-authored-by: Lei Sun <lesun@linkedin.com>
Co-authored-by: Jiefan <jiefli@linkedin.com>
Co-authored-by: yiqiangin <103528904+yiqiangin@users.noreply.github.com>
Co-authored-by: Malini Mahalakshmi Venkatachari <maluchari@gmail.com>
Co-authored-by: Yiqiang Ding <yiqding@linkedin.com>
Co-authored-by: Yiqiang Ding <yiqding@yiqding-mn1.linkedin.biz>
Co-authored-by: Jack Moseley <jmoseley@linkedin.com>
rzhang10 added a commit to rzhang10/iceberg that referenced this pull request Nov 4, 2022
* Hive Catalog: Add a hive catalog that does not override existing Hive metadata (linkedin#10)

Add custom hive catalog to not override existing Hive metadata

Fail early with a proper exception if the metadata file is not existing

Simplify CustomHiveCatalog (linkedin#22)

* Shading: Add a iceberg-runtime shaded module (linkedin#12)

* ORC: Add test for reading files without Iceberg IDs (linkedin#16)

* Hive Metadata Scan: Support reading tables with only Hive metadata (linkedin#23, linkedin#24, linkedin#25, linkedin#26)

- Support for non string partition columns (linkedin#24)
- Support for Hive tables without avro.schema.literal (linkedin#25)
- Hive Metadata Scan: Notify ScanEvent listeners on planning (linkedin#35)
- Hive Metadata Scan: Do not use table snapshot summary for estimating statistics (linkedin#37)
- Hive Metadata Scan: Return empty statistics (linkedin#49)
- Hive Metadata Scan: Do not throw an exception on dangling partitions; log warning message (linkedin#50)
- Hive Metadata Scan: Fix pushdown of non-partition predicates within NOT (linkedin#51)

Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
Co-authored-by: Walaa Eldin Moustafa <wmoustafa@linkedin.com>

* Row level filtering: Allow table scans to pass a row level filter for ORC files

- ORC: Support NameMapping with row-level filtering (linkedin#53)

* Hive: Made Predicate Pushdown dynamic based on the Hive Version

* Hive: Fix uppercase bug and determine catalog from table properties (linkedin#38)

* Hive: Return lowercase fieldname from IcebergRecordStructField
* Hive: Determine catalog from table property

* Hive: Fix schema not forwarded to SerDe on MR jobs (linkedin#45) (linkedin#47)

* Hive: Use Hive table location in HiveIcebergSplit
* Hive: Fix schema not passed to Serde
* Hive: Refactor tests for tables with unqualified location URI

Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>

* Hive Metadata Scan: Support case insensitive name mapping (linkedin#52)

* Hive Metadata Scan: Merge Hive and Avro schemas to fix datatype inconsistencies (linkedin#57)

Hive Metadata Scan: Fix Hive primitive to Avro logical type conversion (linkedin#58)

Hive Metadata Scan: Fix support for Hive timestamp type (linkedin#61)

Co-authored-by: Raymond Zhang <razhang@linkedin.com>
Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>

Fix HasDuplicateLowercaseColumnNames's visit method to use a new visi… (linkedin#67)

* Fix HasDuplicateLowercaseColumnNames's visit method to use a new visitor instance every time

* Trigger CI

(cherry picked from commit b90e838)

* Stop using serdeToFileFormat to unblock formats other than Avro or Orc (linkedin#64)

* Stop using serdeToFileFormat to unblock formats other than Avro or Orc

* Fix style check

* Do not delete metadata location when HMS has been successfully updated (linkedin#68)

(cherry picked from commit 766407e)

* Support reading Avro complex union types (linkedin#73)

Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>

* [#2039] Support default value semantic for AVRO (linkedin#75)

(cherry picked from commit c18f4c4)

* Support hive non string partition cols (linkedin#78)

* Support non-string hive type partition columns in LegacyHiveTableScan

* Leverage eval against partition filter expression to filter non-string columns

* Support default value read for ORC format in spark (linkedin#76)

* Support default value read for ORC format in spark

* Refactor common code for ReadBuilder for both non-vectorized and vectorized read

* Fix code style issue

* Add special handling of ROW_POSITION metadata column

* Add corner case check for partition field

* Use BaseDataReader.convertConstant to convert constants, and expand its functionality to support nested-type contants such as array/map/struct

* Support nested type default value for vectorized read

* Support deeply nested type default value for vectorized read

* Support reading ORC complex union types (linkedin#74)

* Support reading orc complex union types

* add more tests

* support union in VectorizedSparkOrcReaders and improve tests

* support union in VectorizedSparkOrcReaders and improve tests - continued

* fix checkstyle

Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>

* Support avro.schema.literal/hive union types in Hive legacy table to Iceberg conversion (linkedin#80)

* Fix ORC schema visitors to support reading ORC files with deeply nest… (linkedin#81)

* Fix ORC schema visitors to support reading ORC files with deeply nested union type schema

* Added test for vectorized read

* Disable avro validation for default values

Co-authored-by: Shenoda Guirguis <sguirgui@linkedin.com>

* Fix spark avro reader reading union schema data (linkedin#83)

* Fix spark avro reader to read correctly structured nested data values

* Make sure field-id mapping is correctly maintained given arbitrary nested schema that contains union

* Avro: Change union read schema from hive to trino (linkedin#84)

* [LI] Avro: Refactor union-to-struct schema - Part 1. changes to support reading Avro

* ORC: Change union read schema from hive to trino (linkedin#85)

* [LI] ORC: Refactor union-to-struct schema - Part 2. changes to support reading ORC

* Change Hive type to Iceberg type conversion for union

* Recorder hive table properties to align the avro.schema.literal placement contract (linkedin#86)

* [#2039] Support default value semantic for AVRO

(cherry picked from commit c18f4c4)

* reverting commits 2c59857 and f362aed (linkedin#88)

Co-authored-by: Shenoda Guirguis <sguirgui@sguirgui-mn1.linkedin.biz>

* logically patching PR 2328 on HiveMetadataPreservingTableOperations

* Support timestamp as partition type (linkedin#91)

* Support timestamp in partition types

* Address comment

* Separate classes under hive legacy package to new hivelink module (linkedin#87)

* separate class under legacy to new hiveberg module

* fix build

* remove hiveberg dependency in iceberg-spark2 module

* Revert "remove hiveberg dependency in iceberg-spark2 module"

This reverts commit 2e8b743.

* rename hiveberg module to hivelink

Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>

* [LI] Align default value validation align with avro semantics in terms of nullable (nested) fields (linkedin#92)

* Align default value validation align with avro semantics in terms of nullable (nested) fields

* Allow setting null as default value for nested fields in record default

* [LI][Spark][Avro] read avro union using decoder instead of directly returning v… (linkedin#94)

* [LI][Spark] read avro union using decoder instead of directly returning value

* Add a comment for the schema

* Improve the logging when the deserailzed index is invalid to read the symbol from enum (linkedin#96)

* Move custom hive catalog to hivelink-core (linkedin#99)

* Handle non-nullable union of single type for Avro (linkedin#98)

* Handle non-nullable union of single type

Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>

* Handle null default in nested type default value situations (linkedin#100)

* Move 'Hive Metadata Scan: Support case insensitive name mapping' (PR 52) to hivelink-core (linkedin#102)

* Remove activeSparkSession (linkedin#103)

* Disable default value preserving (linkedin#106)

* Disable default value preserving

* [LI][Avro] Do not reorder elements inside a Avro union schema (linkedin#93)

* handle single type union properly in AvroSchemaVisitor for deep nested schema (linkedin#107)

* Handle non-nullable union of single type for ORC spark non-vectorized reader (linkedin#104)

* Handle single type union for non-vectorized reader

* [Avro] Retain the type of field while copying the default values. (linkedin#109)

* Retain the type of field while copying the default values.

* [Hivelink] Refactor support hive non string partition cols to rid of … (linkedin#110)

* [Hivelink] Refactor support hive non string partition cols to rid of Iceberg-oss code changes

* Release automation overhaul: Sonatype Nexus, Shipkit and GH Actions (linkedin#101)

* Add scm and developer info (linkedin#111)

* [Core] Fix and refactor schema parser (linkedin#112)

* [Core] Fix/Refactor SchemaParser to fix multiple bugs

* Enhance the UT for testing required fields with default values (linkedin#113)

* Enhance the UT for testing required fields with default values

* Addressed review comments

* Addressed review comment

* Support single type union for ORC-vectorization reader (linkedin#114)

* Support single type union for ORC-vectorization reader

* Support single type union for ORC-vectorization reader

Co-authored-by: Yiqiang Ding <yiqding@yiqding-mn1.linkedin.biz>

* Refactor HMS code upon cherry-pick

* Check for schema corruption and fix it on commit (linkedin#117)

* Check for schema corruption and fix it on commit

* ORC: Handle query where select and filter only uses default value col… (linkedin#118)

* ORC: Handle query where select and filter only use default value columns

* Set ORC columns and fix case-sensitivity issue with schema check (linkedin#119)

* Hive: Return null for currentSnapshot() (linkedin#121)

* Hive: Return null for currentSnapshot()

* Handle snapshots()

* Fix MergeHiveSchemaWithAvro to make it copy full Avro schema attributes (linkedin#120)

* Fix MergeHiveSchemaWithAvro to make it copy full Avro schema attributes

* Add logic to derive partition column id from partition.column.ids pro… (linkedin#122)

* Add logic to derive partition column id from partition.column.ids property

* Do not push down filter to ORC for union type schema (linkedin#123)

* Bug fix: MergeHiveSchemaWithAvro should retain avro properties for li… (linkedin#125)

* Bug fix: MergeHiveSchemaWithAvro should retain avro properties for list and map when they are nullable

* LinkedIn rebase draft

* Refactor hivelink 1

* Make hivelink module test all pass

* Make spark 2.4 module work

* Fix mr module

* Make spark 3.1 module work

* Fix TestSparkMetadataColumns

* Minor fix for spark 2.4

* Update default spark version to 3.1

* Update java ci to only run spark 2.4 and 3.1

* Minor fix HiveTableOperations

* Adapt github CI to 0.14.x branch

* Fix mr module checkstyle

* Fix checkstyle for orc module

* Fix spark2.4 checkstyle

* Refactor catalog loading logic using CatalogUtil

* Minor change to CI/release

Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>
Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Shardul Mahadik <shardul.m@somaiya.edu>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
Co-authored-by: Walaa Eldin Moustafa <wmoustafa@linkedin.com>
Co-authored-by: Sushant Raikar <sraikar@linkedin.com>
Co-authored-by: ZihanLi58 <48699939+ZihanLi58@users.noreply.github.com>
Co-authored-by: Wenye Zhang <wyzhang@linkedin.com>
Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>
Co-authored-by: Shenoda Guirguis <sguirguis@linkedin.com>
Co-authored-by: Shenoda Guirguis <sguirgui@linkedin.com>
Co-authored-by: Shenoda Guirguis <sguirgui@sguirgui-mn1.linkedin.biz>
Co-authored-by: Lei Sun <lesun@linkedin.com>
Co-authored-by: Jiefan <jiefli@linkedin.com>
Co-authored-by: yiqiangin <103528904+yiqiangin@users.noreply.github.com>
Co-authored-by: Malini Mahalakshmi Venkatachari <maluchari@gmail.com>
Co-authored-by: Yiqiang Ding <yiqding@linkedin.com>
Co-authored-by: Yiqiang Ding <yiqding@yiqding-mn1.linkedin.biz>
Co-authored-by: Jack Moseley <jmoseley@linkedin.com>
rzhang10 added a commit to rzhang10/iceberg that referenced this pull request Nov 4, 2022
* Hive Catalog: Add a hive catalog that does not override existing Hive metadata (linkedin#10)

Add custom hive catalog to not override existing Hive metadata

Fail early with a proper exception if the metadata file is not existing

Simplify CustomHiveCatalog (linkedin#22)

* Shading: Add a iceberg-runtime shaded module (linkedin#12)

* ORC: Add test for reading files without Iceberg IDs (linkedin#16)

* Hive Metadata Scan: Support reading tables with only Hive metadata (linkedin#23, linkedin#24, linkedin#25, linkedin#26)

- Support for non string partition columns (linkedin#24)
- Support for Hive tables without avro.schema.literal (linkedin#25)
- Hive Metadata Scan: Notify ScanEvent listeners on planning (linkedin#35)
- Hive Metadata Scan: Do not use table snapshot summary for estimating statistics (linkedin#37)
- Hive Metadata Scan: Return empty statistics (linkedin#49)
- Hive Metadata Scan: Do not throw an exception on dangling partitions; log warning message (linkedin#50)
- Hive Metadata Scan: Fix pushdown of non-partition predicates within NOT (linkedin#51)

Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
Co-authored-by: Walaa Eldin Moustafa <wmoustafa@linkedin.com>

* Row level filtering: Allow table scans to pass a row level filter for ORC files

- ORC: Support NameMapping with row-level filtering (linkedin#53)

* Hive: Made Predicate Pushdown dynamic based on the Hive Version

* Hive: Fix uppercase bug and determine catalog from table properties (linkedin#38)

* Hive: Return lowercase fieldname from IcebergRecordStructField
* Hive: Determine catalog from table property

* Hive: Fix schema not forwarded to SerDe on MR jobs (linkedin#45) (linkedin#47)

* Hive: Use Hive table location in HiveIcebergSplit
* Hive: Fix schema not passed to Serde
* Hive: Refactor tests for tables with unqualified location URI

Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>

* Hive Metadata Scan: Support case insensitive name mapping (linkedin#52)

* Hive Metadata Scan: Merge Hive and Avro schemas to fix datatype inconsistencies (linkedin#57)

Hive Metadata Scan: Fix Hive primitive to Avro logical type conversion (linkedin#58)

Hive Metadata Scan: Fix support for Hive timestamp type (linkedin#61)

Co-authored-by: Raymond Zhang <razhang@linkedin.com>
Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>

Fix HasDuplicateLowercaseColumnNames's visit method to use a new visi… (linkedin#67)

* Fix HasDuplicateLowercaseColumnNames's visit method to use a new visitor instance every time

* Trigger CI

(cherry picked from commit b90e838)

* Stop using serdeToFileFormat to unblock formats other than Avro or Orc (linkedin#64)

* Stop using serdeToFileFormat to unblock formats other than Avro or Orc

* Fix style check

* Do not delete metadata location when HMS has been successfully updated (linkedin#68)

(cherry picked from commit 766407e)

* Support reading Avro complex union types (linkedin#73)

Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>

* [#2039] Support default value semantic for AVRO (linkedin#75)

(cherry picked from commit c18f4c4)

* Support hive non string partition cols (linkedin#78)

* Support non-string hive type partition columns in LegacyHiveTableScan

* Leverage eval against partition filter expression to filter non-string columns

* Support default value read for ORC format in spark (linkedin#76)

* Support default value read for ORC format in spark

* Refactor common code for ReadBuilder for both non-vectorized and vectorized read

* Fix code style issue

* Add special handling of ROW_POSITION metadata column

* Add corner case check for partition field

* Use BaseDataReader.convertConstant to convert constants, and expand its functionality to support nested-type contants such as array/map/struct

* Support nested type default value for vectorized read

* Support deeply nested type default value for vectorized read

* Support reading ORC complex union types (linkedin#74)

* Support reading orc complex union types

* add more tests

* support union in VectorizedSparkOrcReaders and improve tests

* support union in VectorizedSparkOrcReaders and improve tests - continued

* fix checkstyle

Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>

* Support avro.schema.literal/hive union types in Hive legacy table to Iceberg conversion (linkedin#80)

* Fix ORC schema visitors to support reading ORC files with deeply nest… (linkedin#81)

* Fix ORC schema visitors to support reading ORC files with deeply nested union type schema

* Added test for vectorized read

* Disable avro validation for default values

Co-authored-by: Shenoda Guirguis <sguirgui@linkedin.com>

* Fix spark avro reader reading union schema data (linkedin#83)

* Fix spark avro reader to read correctly structured nested data values

* Make sure field-id mapping is correctly maintained given arbitrary nested schema that contains union

* Avro: Change union read schema from hive to trino (linkedin#84)

* [LI] Avro: Refactor union-to-struct schema - Part 1. changes to support reading Avro

* ORC: Change union read schema from hive to trino (linkedin#85)

* [LI] ORC: Refactor union-to-struct schema - Part 2. changes to support reading ORC

* Change Hive type to Iceberg type conversion for union

* Recorder hive table properties to align the avro.schema.literal placement contract (linkedin#86)

* [#2039] Support default value semantic for AVRO

(cherry picked from commit c18f4c4)

* reverting commits 2c59857 and f362aed (linkedin#88)

Co-authored-by: Shenoda Guirguis <sguirgui@sguirgui-mn1.linkedin.biz>

* logically patching PR 2328 on HiveMetadataPreservingTableOperations

* Support timestamp as partition type (linkedin#91)

* Support timestamp in partition types

* Address comment

* Separate classes under hive legacy package to new hivelink module (linkedin#87)

* separate class under legacy to new hiveberg module

* fix build

* remove hiveberg dependency in iceberg-spark2 module

* Revert "remove hiveberg dependency in iceberg-spark2 module"

This reverts commit 2e8b743.

* rename hiveberg module to hivelink

Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>

* [LI] Align default value validation align with avro semantics in terms of nullable (nested) fields (linkedin#92)

* Align default value validation align with avro semantics in terms of nullable (nested) fields

* Allow setting null as default value for nested fields in record default

* [LI][Spark][Avro] read avro union using decoder instead of directly returning v… (linkedin#94)

* [LI][Spark] read avro union using decoder instead of directly returning value

* Add a comment for the schema

* Improve the logging when the deserailzed index is invalid to read the symbol from enum (linkedin#96)

* Move custom hive catalog to hivelink-core (linkedin#99)

* Handle non-nullable union of single type for Avro (linkedin#98)

* Handle non-nullable union of single type

Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>

* Handle null default in nested type default value situations (linkedin#100)

* Move 'Hive Metadata Scan: Support case insensitive name mapping' (PR 52) to hivelink-core (linkedin#102)

* Remove activeSparkSession (linkedin#103)

* Disable default value preserving (linkedin#106)

* Disable default value preserving

* [LI][Avro] Do not reorder elements inside a Avro union schema (linkedin#93)

* handle single type union properly in AvroSchemaVisitor for deep nested schema (linkedin#107)

* Handle non-nullable union of single type for ORC spark non-vectorized reader (linkedin#104)

* Handle single type union for non-vectorized reader

* [Avro] Retain the type of field while copying the default values. (linkedin#109)

* Retain the type of field while copying the default values.

* [Hivelink] Refactor support hive non string partition cols to rid of … (linkedin#110)

* [Hivelink] Refactor support hive non string partition cols to rid of Iceberg-oss code changes

* Release automation overhaul: Sonatype Nexus, Shipkit and GH Actions (linkedin#101)

* Add scm and developer info (linkedin#111)

* [Core] Fix and refactor schema parser (linkedin#112)

* [Core] Fix/Refactor SchemaParser to fix multiple bugs

* Enhance the UT for testing required fields with default values (linkedin#113)

* Enhance the UT for testing required fields with default values

* Addressed review comments

* Addressed review comment

* Support single type union for ORC-vectorization reader (linkedin#114)

* Support single type union for ORC-vectorization reader

* Support single type union for ORC-vectorization reader

Co-authored-by: Yiqiang Ding <yiqding@yiqding-mn1.linkedin.biz>

* Refactor HMS code upon cherry-pick

* Check for schema corruption and fix it on commit (linkedin#117)

* Check for schema corruption and fix it on commit

* ORC: Handle query where select and filter only uses default value col… (linkedin#118)

* ORC: Handle query where select and filter only use default value columns

* Set ORC columns and fix case-sensitivity issue with schema check (linkedin#119)

* Hive: Return null for currentSnapshot() (linkedin#121)

* Hive: Return null for currentSnapshot()

* Handle snapshots()

* Fix MergeHiveSchemaWithAvro to make it copy full Avro schema attributes (linkedin#120)

* Fix MergeHiveSchemaWithAvro to make it copy full Avro schema attributes

* Add logic to derive partition column id from partition.column.ids pro… (linkedin#122)

* Add logic to derive partition column id from partition.column.ids property

* Do not push down filter to ORC for union type schema (linkedin#123)

* Bug fix: MergeHiveSchemaWithAvro should retain avro properties for li… (linkedin#125)

* Bug fix: MergeHiveSchemaWithAvro should retain avro properties for list and map when they are nullable

* LinkedIn rebase draft

* Refactor hivelink 1

* Make hivelink module test all pass

* Make spark 2.4 module work

* Fix mr module

* Make spark 3.1 module work

* Fix TestSparkMetadataColumns

* Minor fix for spark 2.4

* Update default spark version to 3.1

* Update java ci to only run spark 2.4 and 3.1

* Minor fix HiveTableOperations

* Adapt github CI to 0.14.x branch

* Fix mr module checkstyle

* Fix checkstyle for orc module

* Fix spark2.4 checkstyle

* Refactor catalog loading logic using CatalogUtil

* Minor change to CI/release

Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>
Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Shardul Mahadik <shardul.m@somaiya.edu>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
Co-authored-by: Walaa Eldin Moustafa <wmoustafa@linkedin.com>
Co-authored-by: Sushant Raikar <sraikar@linkedin.com>
Co-authored-by: ZihanLi58 <48699939+ZihanLi58@users.noreply.github.com>
Co-authored-by: Wenye Zhang <wyzhang@linkedin.com>
Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>
Co-authored-by: Shenoda Guirguis <sguirguis@linkedin.com>
Co-authored-by: Shenoda Guirguis <sguirgui@linkedin.com>
Co-authored-by: Shenoda Guirguis <sguirgui@sguirgui-mn1.linkedin.biz>
Co-authored-by: Lei Sun <lesun@linkedin.com>
Co-authored-by: Jiefan <jiefli@linkedin.com>
Co-authored-by: yiqiangin <103528904+yiqiangin@users.noreply.github.com>
Co-authored-by: Malini Mahalakshmi Venkatachari <maluchari@gmail.com>
Co-authored-by: Yiqiang Ding <yiqding@linkedin.com>
Co-authored-by: Yiqiang Ding <yiqding@yiqding-mn1.linkedin.biz>
Co-authored-by: Jack Moseley <jmoseley@linkedin.com>
rzhang10 added a commit that referenced this pull request Dec 17, 2022
* Rebase LI-Iceberg changes on top of Apache Iceberg 1.0.0 release

* Hive Catalog: Add a hive catalog that does not override existing Hive metadata (#10)

Add custom hive catalog to not override existing Hive metadata

Fail early with a proper exception if the metadata file is not existing

Simplify CustomHiveCatalog (#22)

* Shading: Add a iceberg-runtime shaded module (#12)

* ORC: Add test for reading files without Iceberg IDs (#16)

* Hive Metadata Scan: Support reading tables with only Hive metadata (#23, #24, #25, #26)

- Support for non string partition columns (#24)
- Support for Hive tables without avro.schema.literal (#25)
- Hive Metadata Scan: Notify ScanEvent listeners on planning (#35)
- Hive Metadata Scan: Do not use table snapshot summary for estimating statistics (#37)
- Hive Metadata Scan: Return empty statistics (#49)
- Hive Metadata Scan: Do not throw an exception on dangling partitions; log warning message (#50)
- Hive Metadata Scan: Fix pushdown of non-partition predicates within NOT (#51)

Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
Co-authored-by: Walaa Eldin Moustafa <wmoustafa@linkedin.com>

* Row level filtering: Allow table scans to pass a row level filter for ORC files

- ORC: Support NameMapping with row-level filtering (#53)

* Hive: Made Predicate Pushdown dynamic based on the Hive Version

* Hive: Fix uppercase bug and determine catalog from table properties (#38)

* Hive: Return lowercase fieldname from IcebergRecordStructField
* Hive: Determine catalog from table property

* Hive: Fix schema not forwarded to SerDe on MR jobs (#45) (#47)

* Hive: Use Hive table location in HiveIcebergSplit
* Hive: Fix schema not passed to Serde
* Hive: Refactor tests for tables with unqualified location URI

Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>

* Hive Metadata Scan: Support case insensitive name mapping (#52)

* Hive Metadata Scan: Merge Hive and Avro schemas to fix datatype inconsistencies (#57)

Hive Metadata Scan: Fix Hive primitive to Avro logical type conversion (#58)

Hive Metadata Scan: Fix support for Hive timestamp type (#61)

Co-authored-by: Raymond Zhang <razhang@linkedin.com>
Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>

Fix HasDuplicateLowercaseColumnNames's visit method to use a new visi… (#67)

* Fix HasDuplicateLowercaseColumnNames's visit method to use a new visitor instance every time

* Trigger CI

(cherry picked from commit b90e838)

* Stop using serdeToFileFormat to unblock formats other than Avro or Orc (#64)

* Stop using serdeToFileFormat to unblock formats other than Avro or Orc

* Fix style check

* Do not delete metadata location when HMS has been successfully updated (#68)

(cherry picked from commit 766407e)

* Support reading Avro complex union types (#73)

Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>

* [#2039] Support default value semantic for AVRO (#75)

(cherry picked from commit c18f4c4)

* Support hive non string partition cols (#78)

* Support non-string hive type partition columns in LegacyHiveTableScan

* Leverage eval against partition filter expression to filter non-string columns

* Support default value read for ORC format in spark (#76)

* Support default value read for ORC format in spark

* Refactor common code for ReadBuilder for both non-vectorized and vectorized read

* Fix code style issue

* Add special handling of ROW_POSITION metadata column

* Add corner case check for partition field

* Use BaseDataReader.convertConstant to convert constants, and expand its functionality to support nested-type contants such as array/map/struct

* Support nested type default value for vectorized read

* Support deeply nested type default value for vectorized read

* Support reading ORC complex union types (#74)

* Support reading orc complex union types

* add more tests

* support union in VectorizedSparkOrcReaders and improve tests

* support union in VectorizedSparkOrcReaders and improve tests - continued

* fix checkstyle

Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>

* Support avro.schema.literal/hive union types in Hive legacy table to Iceberg conversion (#80)

* Fix ORC schema visitors to support reading ORC files with deeply nest… (#81)

* Fix ORC schema visitors to support reading ORC files with deeply nested union type schema

* Added test for vectorized read

* Disable avro validation for default values

Co-authored-by: Shenoda Guirguis <sguirgui@linkedin.com>

* Fix spark avro reader reading union schema data (#83)

* Fix spark avro reader to read correctly structured nested data values

* Make sure field-id mapping is correctly maintained given arbitrary nested schema that contains union

* Avro: Change union read schema from hive to trino (#84)

* [LI] Avro: Refactor union-to-struct schema - Part 1. changes to support reading Avro

* ORC: Change union read schema from hive to trino (#85)

* [LI] ORC: Refactor union-to-struct schema - Part 2. changes to support reading ORC

* Change Hive type to Iceberg type conversion for union

* Recorder hive table properties to align the avro.schema.literal placement contract (#86)

* [#2039] Support default value semantic for AVRO

(cherry picked from commit c18f4c4)

* reverting commits 2c59857 and f362aed (#88)

Co-authored-by: Shenoda Guirguis <sguirgui@sguirgui-mn1.linkedin.biz>

* logically patching PR 2328 on HiveMetadataPreservingTableOperations

* Support timestamp as partition type (#91)

* Support timestamp in partition types

* Address comment

* Separate classes under hive legacy package to new hivelink module (#87)

* separate class under legacy to new hiveberg module

* fix build

* remove hiveberg dependency in iceberg-spark2 module

* Revert "remove hiveberg dependency in iceberg-spark2 module"

This reverts commit 2e8b743.

* rename hiveberg module to hivelink

Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>

* [LI] Align default value validation align with avro semantics in terms of nullable (nested) fields (#92)

* Align default value validation align with avro semantics in terms of nullable (nested) fields

* Allow setting null as default value for nested fields in record default

* [LI][Spark][Avro] read avro union using decoder instead of directly returning v… (#94)

* [LI][Spark] read avro union using decoder instead of directly returning value

* Add a comment for the schema

* Improve the logging when the deserailzed index is invalid to read the symbol from enum (#96)

* Move custom hive catalog to hivelink-core (#99)

* Handle non-nullable union of single type for Avro (#98)

* Handle non-nullable union of single type

Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>

* Handle null default in nested type default value situations (#100)

* Move 'Hive Metadata Scan: Support case insensitive name mapping' (PR 52) to hivelink-core (#102)

* Remove activeSparkSession (#103)

* Disable default value preserving (#106)

* Disable default value preserving

* [LI][Avro] Do not reorder elements inside a Avro union schema (#93)

* handle single type union properly in AvroSchemaVisitor for deep nested schema (#107)

* Handle non-nullable union of single type for ORC spark non-vectorized reader (#104)

* Handle single type union for non-vectorized reader

* [Avro] Retain the type of field while copying the default values. (#109)

* Retain the type of field while copying the default values.

* [Hivelink] Refactor support hive non string partition cols to rid of … (#110)

* [Hivelink] Refactor support hive non string partition cols to rid of Iceberg-oss code changes

* Release automation overhaul: Sonatype Nexus, Shipkit and GH Actions (#101)

* Add scm and developer info (#111)

* [Core] Fix and refactor schema parser (#112)

* [Core] Fix/Refactor SchemaParser to fix multiple bugs

* Enhance the UT for testing required fields with default values (#113)

* Enhance the UT for testing required fields with default values

* Addressed review comments

* Addressed review comment

* Support single type union for ORC-vectorization reader (#114)

* Support single type union for ORC-vectorization reader

* Support single type union for ORC-vectorization reader

Co-authored-by: Yiqiang Ding <yiqding@yiqding-mn1.linkedin.biz>

* Refactor HMS code upon cherry-pick

* Check for schema corruption and fix it on commit (#117)

* Check for schema corruption and fix it on commit

* ORC: Handle query where select and filter only uses default value col… (#118)

* ORC: Handle query where select and filter only use default value columns

* Set ORC columns and fix case-sensitivity issue with schema check (#119)

* Hive: Return null for currentSnapshot() (#121)

* Hive: Return null for currentSnapshot()

* Handle snapshots()

* Fix MergeHiveSchemaWithAvro to make it copy full Avro schema attributes (#120)

* Fix MergeHiveSchemaWithAvro to make it copy full Avro schema attributes

* Add logic to derive partition column id from partition.column.ids pro… (#122)

* Add logic to derive partition column id from partition.column.ids property

* Do not push down filter to ORC for union type schema (#123)

* Bug fix: MergeHiveSchemaWithAvro should retain avro properties for li… (#125)

* Bug fix: MergeHiveSchemaWithAvro should retain avro properties for list and map when they are nullable

* LinkedIn rebase draft

* Refactor hivelink 1

* Make hivelink module test all pass

* Make spark 2.4 module work

* Fix mr module

* Make spark 3.1 module work

* Fix TestSparkMetadataColumns

* Minor fix for spark 2.4

* Update default spark version to 3.1

* Update java ci to only run spark 2.4 and 3.1

* Minor fix HiveTableOperations

* Adapt github CI to 0.14.x branch

* Fix mr module checkstyle

* Fix checkstyle for orc module

* Fix spark2.4 checkstyle

* Refactor catalog loading logic using CatalogUtil

* Minor change to CI/release

Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>
Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Shardul Mahadik <shardul.m@somaiya.edu>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
Co-authored-by: Walaa Eldin Moustafa <wmoustafa@linkedin.com>
Co-authored-by: Sushant Raikar <sraikar@linkedin.com>
Co-authored-by: ZihanLi58 <48699939+ZihanLi58@users.noreply.github.com>
Co-authored-by: Wenye Zhang <wyzhang@linkedin.com>
Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>
Co-authored-by: Shenoda Guirguis <sguirguis@linkedin.com>
Co-authored-by: Shenoda Guirguis <sguirgui@linkedin.com>
Co-authored-by: Shenoda Guirguis <sguirgui@sguirgui-mn1.linkedin.biz>
Co-authored-by: Lei Sun <lesun@linkedin.com>
Co-authored-by: Jiefan <jiefli@linkedin.com>
Co-authored-by: yiqiangin <103528904+yiqiangin@users.noreply.github.com>
Co-authored-by: Malini Mahalakshmi Venkatachari <maluchari@gmail.com>
Co-authored-by: Yiqiang Ding <yiqding@linkedin.com>
Co-authored-by: Yiqiang Ding <yiqding@yiqding-mn1.linkedin.biz>
Co-authored-by: Jack Moseley <jmoseley@linkedin.com>

* Add flink 1.14 artifacts for release

Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>
Co-authored-by: Ratandeep Ratti <rratti@linkedin.com>
Co-authored-by: Shardul Mahadik <shardul.m@somaiya.edu>
Co-authored-by: Kuai Yu <kuyu@linkedin.com>
Co-authored-by: Walaa Eldin Moustafa <wmoustafa@linkedin.com>
Co-authored-by: Sushant Raikar <sraikar@linkedin.com>
Co-authored-by: ZihanLi58 <48699939+ZihanLi58@users.noreply.github.com>
Co-authored-by: Wenye Zhang <wyzhang@linkedin.com>
Co-authored-by: Wenye Zhang <wyzhang@wyzhang-mn1.linkedin.biz>
Co-authored-by: Shenoda Guirguis <sguirguis@linkedin.com>
Co-authored-by: Shenoda Guirguis <sguirgui@linkedin.com>
Co-authored-by: Shenoda Guirguis <sguirgui@sguirgui-mn1.linkedin.biz>
Co-authored-by: Lei Sun <lesun@linkedin.com>
Co-authored-by: Jiefan <jiefli@linkedin.com>
Co-authored-by: yiqiangin <103528904+yiqiangin@users.noreply.github.com>
Co-authored-by: Malini Mahalakshmi Venkatachari <maluchari@gmail.com>
Co-authored-by: Yiqiang Ding <yiqding@linkedin.com>
Co-authored-by: Yiqiang Ding <yiqding@yiqding-mn1.linkedin.biz>
Co-authored-by: Jack Moseley <jmoseley@linkedin.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants