-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Hive MetaStore support to kafka-connect-s3 #572
base: master
Are you sure you want to change the base?
Add Hive MetaStore support to kafka-connect-s3 #572
Conversation
Was getting the following output: '[INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0' After updating the maven-surefire-plugin to add the surefire-junit4 dependency the unit tests are now being executed
Unrelated to my changes, but unnecessary noise that should be fixed. These two tests seem to have been failing on upstream master since the following commit/merge: confluentinc@c633f08 Updated the test expectations to match the current code.
Currently supports Avro and Parquet formats only. The functionality was ported over from kafka-connect-hdfs with the following simplications: 1) No listing of files in storage to determine missing partitions on startup 2) No WAL used for that same purpose Those features were deemed too complex to port over relative to their added benefits. While there likely is a small window where some partitions may not be added in the case of a crash or shutdown, we believe that any missing partitions in the Hive MetaStore can be corrected/added out-of-band without both the code complexity and potential startup costs of reconciling those discrepancies. N.B. A major overhaul of dependencies was required to avoid conflicts due to Hadoop/Hive jars containing non-shaded copies of misc. dependencies.
Hi, it's not clear to me that the Jenkins-public-CI integration test failures are due to my code changes. |
1632914
to
27b04fd
Compare
It uses the maven-shade-plugin to prune out Hive/Hadoop related duplicate classes so that we don't get version mismatches at runtime e.g. usage `mvn package -Phive`
27b04fd
to
ed5abc3
Compare
hi @frankgrimes97 , don't want to bother you, thanks for the work in this feature |
@mattssll I'm not very familiar with AWS's Glue Data Catalog but in my brief searching/reading I found the following:
Looking at the code, it's not clear whether a stock Apache Hive client can actually talk to Glue Data Catalog: https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/tree/master/aws-glue-datacatalog-hive2-client/src/main/java/com/amazonaws/glue/catalog/metastore |
We are still interested in working to get this work accepted upstream. Are any current Confluent maintainers available to help us accomplish that? |
This PR attempts to address #237
We would very much like to work towards having this contributed back upstream rather than maintain our own fork.
Feedback is most welcome!