Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Hive JDBC Source] Support Hive JDBC Source Connector #5424

Merged
merged 78 commits into from
Nov 7, 2023

Conversation

NickCodeJourney
Copy link
Contributor

@NickCodeJourney NickCodeJourney commented Sep 4, 2023

close #5389

Purpose of this pull request

Check list

yangpeng added 2 commits September 4, 2023 22:38
This commit includes the addition of Hive JDBC dependencies in the pom.xml file to enable the JDBC connector support Hive. The Hive dialect, type mapper and row converter were added accordingly. This update allows users to connect and interact with Hive directly using the JDBC connector."
This commit includes the addition of Hive JDBC dependencies in the pom.xml file to enable the JDBC connector support Hive. The Hive dialect, type mapper and row converter were added accordingly. This update allows users to connect and interact with Hive directly using the JDBC connector."
@NickCodeJourney NickCodeJourney marked this pull request as draft September 4, 2023 14:41
@NickCodeJourney NickCodeJourney marked this pull request as ready for review September 4, 2023 14:41
Copy link
Member

@EricJoy2048 EricJoy2048 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit introduces a new documentation file under `/docs/en/connector-v2/source/` explaining how to use the Hive JDBC source connector. The included information covers the supported engines, key features, support DataSource info, data type mapping, source options and database dependency, along with examples for how to use the connector with different configurations. This documentation is intended to help users to understand and effectively use the Hive JDBC source connector in their data pipeline.
Comment on lines +89 to +94
case HIVE_NUMERIC:
if (precision > 0) {
return new DecimalType(precision, metadata.getScale(colIndex));
}
LOG.warn("decimal did define precision,scale, will be Decimal(38,18)");
return new DecimalType(38, 18);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should maintain similar data types

yangpeng added 2 commits September 7, 2023 01:55
Introduces JDBC Hive Integration Tests and new configuration settings for the 'connector-jdbc-e2e' module in the SeaTunnel v2 project. These additions are designed to enhance testing and configuration handling capabilities for connections to Hive databases. The new JdbcHiveIT class implements Hive-specific configurations and integration test cases. The jdbc_hive_source.conf file contains a sample configuration setup for a Hive JDBC source connection.
Fixed the case sensitivity issue in jdbc config filename in e2e-test. Changed "/jdbc_HIVE_source_and_sink.conf" to "/jdbc_hive_source_and_sink.conf". Linux servers are case sensitive hence the filename should match exact case to avoid file not found exceptions while running end-to-end testing.
This commit adds a BYTE_TYPE mapping to the HiveTypeMapper class for the HIVE_TINYINT field. This change allows mapping of HIVE_TINYINT to BasicType.BYTE_TYPE, ensuring type consistency.
Jdbc {
url = "jdbc:hive2://localhost:10000/default"
user = "root"
driver = "org.apache.hive.jdbc.HiveDriver"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently do not have hive E2e tests, can your E2e directly cover jdbc tests and hive thrift mode read and write, metastore
https://github.com/apache/seatunnel/blob/dev/docs/en/connector-v2/source/Hive.md,https://github.com/apache/seatunnel/blob/dev/docs/en/connector-v2/sink/Hive.md

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/apache/seatunnel/blob/dev/docs/en/connector-v2/source/Hive.md,https://github.com/apache/seatunnel/blob/dev/docs/en/connector-v2/sink/Hive.md

jdbc_hive and hive are difference connector, It would be better to separate the e2e of the two connectors. We are missing the e2e for the hive connector.

@NickYoungPeng If you are willing, you can provide a separate PR to add the e2e for the hive connector.

yangpeng added 2 commits September 14, 2023 21:42
# Conflicts:
#	seatunnel-connectors-v2/connector-jdbc/pom.xml
Removed redundant file and tidied up pom.xml. Removed the file jdbc_hive_source.conf, added needed dependencies for Hive in pom.xml, and removed an unused module from connector-file pom. Improved tests by updating test configuration to better demonstrate features and modified corresponding test class accordingly. Changes aim to enhance test readability and correctness.
@NickCodeJourney
Copy link
Contributor Author

NickCodeJourney commented Sep 15, 2023

I want Hive JDBC to support Kerberos authentication. Do you have any suggestions?

  1. My 1st idea is to enhance the SimpleJdbcConnectionProvider. This would involve modifying the JdbcConnectionConfig class to support Hive JDBC Kerberos authentication.
  2. My 2ed idea is Create a new JdbcConnectionProvider, such as HiveJdbcConnectionProvider, and inject it through SPI (Service Provider Interface).

yangpeng and others added 6 commits September 15, 2023 15:26
Updated the HIVE_IMAGE from "youngyangp/hive_3.1.2_arm:1.0.0" to "apache/hive:3.1.3" in the JdbcHiveIT test. Also, introduced "SERVICE_NAME" as an environment variable to the Hive server container.

The HIVE_IMAGE was updated for using an official Apache hive image. The environment variable is crucial for service internal recognition.
Uniformly formatted the environment variables for better code readability. The change is on line 160 from '.withEnv("SERVICE_NAME","hiveserver2")' to '.withEnv("SERVICE_NAME", "hiveserver2")'. The whitespace after file.
Added a new parameter, 'auto_commit', to the jdbc_hive configuration file to improve database transaction management. This is pivotal for applications that require a high level of data integrity and consistency, or for those that need to manage complex, multi-step transactions.
Updated hive.jdbc.version to 3.1.3 in connector-jdbc/pom.xml to have access to the latest features, bug fixes and improvements. Also, removed the 'provided' scope from the same file as it is not necessary and its presence could lead to potential issues with the dependency resolution during the build process.
@EricJoy2048
Copy link
Member

I want Hive JDBC to support Kerberos authentication. Do you have any suggestions?

  1. My 1st idea is to enhance the SimpleJdbcConnectionProvider. This would involve modifying the JdbcConnectionConfig class to support Hive JDBC Kerberos authentication.
  2. My 2ed idea is Create a new JdbcConnectionProvider, such as HiveJdbcConnectionProvider, and inject it through SPI (Service Provider Interface).

It seems that the second option is more suitable.

@EricJoy2048 EricJoy2048 added this to the 2.3.4 milestone Sep 25, 2023
NickCodeJourney and others added 5 commits September 25, 2023 16:59
This commit enables kerberos authentication while establishing JDBC connections. The JdbcOptions class has been updated to include new options such as 'use_kerberos', 'kerberos_principal', etc to configure Kerberos settings. A new connection provider 'HiveJdbcConnectionProvider' has been added which takes care of connection creation when using Kerberos. This change also makes the connection provider more flexible as it can now be selected based on the JDBC dialect."
This commit enables kerberos authentication while establishing JDBC connections. The JdbcOptions class has been updated to include new options such as 'use_kerberos', 'kerberos_principal', etc to configure Kerberos settings. A new connection provider 'HiveJdbcConnectionProvider' has been added which takes care of connection creation when using Kerberos. This change also makes the connection provider more flexible as it can now be selected based on the JDBC dialect."
NickCodeJourney and others added 3 commits October 27, 2023 13:50
Updated the dependency to the oceanbase-client from the com.oceanbase group in the connector-jdbc pom.xml file. This change is necessary to maintain API consistency and efficiency.
EricJoy2048
EricJoy2048 previously approved these changes Oct 30, 2023
Introduced new error codes JDBC-07 and JDBC-08 in connection with the JDBC in the Error Quick Reference Manual. These additions are important to help users understand and diagnose the new issues related to unsupported jdbc type and failed Kerberos authentication respectively.
NickCodeJourney and others added 2 commits October 30, 2023 14:24
Updated the default krb5 file path in jdbc configuration from '/etc/krb5.conf' to '/seatunnel/krb5.conf' or use the default path '/etc/krb5.conf'. This change was made to provide flexibility if the krb5.conf file is not found at the default location. The documentation has been updated to reflect this change
Included definite support information for Hive version 3.1.3 and 3.1.2 in JDBC Hive Source Connector doc, while stating that other versions need testing. This provides clarity on version compatibility, helping users determine if their Hive version is definitely compatible or requires testing.
@zhilinli123
Copy link
Contributor

fix ci

Copy link
Member

@Carl-Zhou-CN Carl-Zhou-CN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@hailin0 hailin0 merged commit a64e177 into apache:dev Nov 7, 2023
8 checks passed
@NickCodeJourney NickCodeJourney deleted the hivejdbc branch November 7, 2023 06:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature][Hive JDBC Source] Support Hive JDBC Source Connector
7 participants