-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature][Hive JDBC Source] Support Hive JDBC Source Connector #5424
Conversation
This commit includes the addition of Hive JDBC dependencies in the pom.xml file to enable the JDBC connector support Hive. The Hive dialect, type mapper and row converter were added accordingly. This update allows users to connect and interact with Hive directly using the JDBC connector."
This commit includes the addition of Hive JDBC dependencies in the pom.xml file to enable the JDBC connector support Hive. The Hive dialect, type mapper and row converter were added accordingly. This update allows users to connect and interact with Hive directly using the JDBC connector."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add document for this connector reference https://github.com/apache/seatunnel/blob/dev/docs/en/connector-v2/source/Mysql.md
This commit introduces a new documentation file under `/docs/en/connector-v2/source/` explaining how to use the Hive JDBC source connector. The included information covers the supported engines, key features, support DataSource info, data type mapping, source options and database dependency, along with examples for how to use the connector with different configurations. This documentation is intended to help users to understand and effectively use the Hive JDBC source connector in their data pipeline.
case HIVE_NUMERIC: | ||
if (precision > 0) { | ||
return new DecimalType(precision, metadata.getScale(colIndex)); | ||
} | ||
LOG.warn("decimal did define precision,scale, will be Decimal(38,18)"); | ||
return new DecimalType(38, 18); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should maintain similar data types
Introduces JDBC Hive Integration Tests and new configuration settings for the 'connector-jdbc-e2e' module in the SeaTunnel v2 project. These additions are designed to enhance testing and configuration handling capabilities for connections to Hive databases. The new JdbcHiveIT class implements Hive-specific configurations and integration test cases. The jdbc_hive_source.conf file contains a sample configuration setup for a Hive JDBC source connection.
Fixed the case sensitivity issue in jdbc config filename in e2e-test. Changed "/jdbc_HIVE_source_and_sink.conf" to "/jdbc_hive_source_and_sink.conf". Linux servers are case sensitive hence the filename should match exact case to avoid file not found exceptions while running end-to-end testing.
...ava/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/hive/HiveTypeMapper.java
Show resolved
Hide resolved
...ava/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/hive/HiveTypeMapper.java
Show resolved
Hide resolved
...jdbc-e2e-part-3/src/test/java/org/apache/seatunnel/connectors/seatunnel/jdbc/JdbcHiveIT.java
Show resolved
Hide resolved
This commit adds a BYTE_TYPE mapping to the HiveTypeMapper class for the HIVE_TINYINT field. This change allows mapping of HIVE_TINYINT to BasicType.BYTE_TYPE, ensuring type consistency.
Jdbc { | ||
url = "jdbc:hive2://localhost:10000/default" | ||
user = "root" | ||
driver = "org.apache.hive.jdbc.HiveDriver" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently do not have hive E2e tests, can your E2e directly cover jdbc tests and hive thrift mode read and write, metastore
https://github.com/apache/seatunnel/blob/dev/docs/en/connector-v2/source/Hive.md,https://github.com/apache/seatunnel/blob/dev/docs/en/connector-v2/sink/Hive.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jdbc_hive
and hive
are difference connector, It would be better to separate the e2e of the two connectors. We are missing the e2e for the hive connector.
@NickYoungPeng If you are willing, you can provide a separate PR to add the e2e for the hive connector.
# Conflicts: # seatunnel-connectors-v2/connector-jdbc/pom.xml
Removed redundant file and tidied up pom.xml. Removed the file jdbc_hive_source.conf, added needed dependencies for Hive in pom.xml, and removed an unused module from connector-file pom. Improved tests by updating test configuration to better demonstrate features and modified corresponding test class accordingly. Changes aim to enhance test readability and correctness.
I want Hive JDBC to support Kerberos authentication. Do you have any suggestions?
|
Updated the HIVE_IMAGE from "youngyangp/hive_3.1.2_arm:1.0.0" to "apache/hive:3.1.3" in the JdbcHiveIT test. Also, introduced "SERVICE_NAME" as an environment variable to the Hive server container. The HIVE_IMAGE was updated for using an official Apache hive image. The environment variable is crucial for service internal recognition.
Uniformly formatted the environment variables for better code readability. The change is on line 160 from '.withEnv("SERVICE_NAME","hiveserver2")' to '.withEnv("SERVICE_NAME", "hiveserver2")'. The whitespace after file.
Added a new parameter, 'auto_commit', to the jdbc_hive configuration file to improve database transaction management. This is pivotal for applications that require a high level of data integrity and consistency, or for those that need to manage complex, multi-step transactions.
Updated hive.jdbc.version to 3.1.3 in connector-jdbc/pom.xml to have access to the latest features, bug fixes and improvements. Also, removed the 'provided' scope from the same file as it is not necessary and its presence could lead to potential issues with the dependency resolution during the build process.
It seems that the second option is more suitable. |
This commit enables kerberos authentication while establishing JDBC connections. The JdbcOptions class has been updated to include new options such as 'use_kerberos', 'kerberos_principal', etc to configure Kerberos settings. A new connection provider 'HiveJdbcConnectionProvider' has been added which takes care of connection creation when using Kerberos. This change also makes the connection provider more flexible as it can now be selected based on the JDBC dialect."
This commit enables kerberos authentication while establishing JDBC connections. The JdbcOptions class has been updated to include new options such as 'use_kerberos', 'kerberos_principal', etc to configure Kerberos settings. A new connection provider 'HiveJdbcConnectionProvider' has been added which takes care of connection creation when using Kerberos. This change also makes the connection provider more flexible as it can now be selected based on the JDBC dialect."
Updated the dependency to the oceanbase-client from the com.oceanbase group in the connector-jdbc pom.xml file. This change is necessary to maintain API consistency and efficiency.
...in/java/org/apache/seatunnel/connectors/seatunnel/jdbc/exception/JdbcConnectorErrorCode.java
Show resolved
Hide resolved
Introduced new error codes JDBC-07 and JDBC-08 in connection with the JDBC in the Error Quick Reference Manual. These additions are important to help users understand and diagnose the new issues related to unsupported jdbc type and failed Kerberos authentication respectively.
Updated the default krb5 file path in jdbc configuration from '/etc/krb5.conf' to '/seatunnel/krb5.conf' or use the default path '/etc/krb5.conf'. This change was made to provide flexibility if the krb5.conf file is not found at the default location. The documentation has been updated to reflect this change
Included definite support information for Hive version 3.1.3 and 3.1.2 in JDBC Hive Source Connector doc, while stating that other versions need testing. This provides clarity on version compatibility, helping users determine if their Hive version is definitely compatible or requires testing.
fix ci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
close #5389
Purpose of this pull request
Check list
New License Guide
release-note
.