Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managed Iceberg hive support and integration tests #32052

Merged
merged 9 commits into from
Aug 9, 2024

Conversation

ahmedabu98
Copy link
Contributor

@ahmedabu98 ahmedabu98 commented Aug 1, 2024

This PR does the following to allow support for HiveCatalog with Managed Iceberg:

  • Expose option to pass through Hadoop Configuration properties (this was needed specifically for Hive, but others will benefit from this too)
  • Determine what extra dependencies are needed to use HiveCatalog
    • Provide a shaded jar that includes these dependencies
  • Add integration tests for reading and writing using HiveCatalog
    • Spin up a local Hive metastore and use a GCS bucket as a warehouse
  • Fix bugs in Iceberg conversion utils

Note that the code in sdk/io/iceberg/hive/testutils/ (60~70% of this PR) is mostly copied from Iceberg's integration test directory: https://github.com/apache/iceberg/tree/main/hive-metastore/src/test/java/org/apache/iceberg/hive.

Copy link
Contributor

github-actions bot commented Aug 1, 2024

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@ahmedabu98
Copy link
Contributor Author

assign set of reviewers

Copy link
Contributor

github-actions bot commented Aug 4, 2024

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @kennknowles for label java.
R: @damccorm for label build.
R: @shunping for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@ahmedabu98
Copy link
Contributor Author

ahmedabu98 commented Aug 7, 2024

Blocked on #32095. This fix is needed for Hive write tests to pass

No further changes will be needed on this PR once #32095 is in though. This PR is ready for review

@ahmedabu98
Copy link
Contributor Author

#32095 has been merged and this PR was just rebased to HEAD

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, just had a minor comment.

FWIW, the size made it harder to review, this might have benefited from being split into 2 prs

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ahmedabu98 ahmedabu98 merged commit b21a84a into apache:master Aug 9, 2024
31 checks passed
lostluck pushed a commit to lostluck/beam that referenced this pull request Aug 12, 2024
* iceberg hive support and integration tests

* split read and write tests; cleanup

* add test documentation

* extend new config_properties arg to translation tests

* revert beam schema override

* actually run hive ITs

* trigger integration tests

* cut down hive database source lines
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants