Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use shardingsphere-proxy to proxy the hive db #32223

Closed
hongjunsu opened this issue Jul 22, 2024 · 5 comments
Closed

How to use shardingsphere-proxy to proxy the hive db #32223

hongjunsu opened this issue Jul 22, 2024 · 5 comments

Comments

@hongjunsu
Copy link

How to use shardingsphere-proxy to proxy the hive db.

I can't find any documentation about Hive configuration

@terrymanu
Copy link
Member

It still in dev

@linghengqian
Copy link
Member

linghengqian commented Jul 22, 2024

TL;DR

  • What I am going to say next may be a bit nonsense, but I do hope that friends with Hive knowledge can help deal with the problems encountered by the master branch on HiveServer2. No matter it is the master branch of ShardingSphere or the master branch of Hive.

  • We can leave ShardingSphere Proxy aside and discuss ShardingSphere JDBC first, both of which actually require the use of optional modules to parse the Hive dialect. First, compile the products of ShardingSpehre into the local repository. You need to remove org.apache.shardingsphere.infra.database.hive.metadata.data.loader.HiveMetaDataLoader as in Add GraalVM Reachability Metadata and corresponding nativeTest for Iceberg table in HiveServer2 #31526 to prevent the creation of an embedded HiveServer2, or try to raise a PR to remove this file marked TODO. Apparently this place involves reading a string like thrift://<host_name>:<port>. Feel free to check https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration

sdk install java 22.0.1-graalce
sdk use java 22.0.1-graalce

git clone git@github.com:apache/shardingsphere.git
cd ./shardingsphere/
git reset --hard 45b69b4d0d249f31b01ce963d82debb7de751da4
./mvnw clean install -Prelease -T1C -DskipTests -Djacoco.skip=true -Dcheckstyle.skip=true -Drat.skip=true -Dmaven.javadoc.skip=true
<project>
    <dependencies>
       <dependency>
         <groupId>org.apache.shardingsphere</groupId>
         <artifactId>shardingsphere-jdbc</artifactId>
         <version>5.5.1-SNAPSHOT</version>
       </dependency>
       <dependency>
            <groupId>org.apache.shardingsphere</groupId>
            <artifactId>shardingsphere-infra-database-hive</artifactId>
            <version>5.5.1-SNAPSHOT</version>
       </dependency>
       <dependency>
          <groupId>org.apache.shardingsphere</groupId>
          <artifactId>shardingsphere-parser-sql-hive</artifactId>
          <version>5.5.1-SNAPSHOT</version>
       </dependency>
       <dependency>
          <groupId>org.apache.hive</groupId>
          <artifactId>hive-jdbc</artifactId>
          <version>4.0.0</version>
       </dependency>
       <dependency>
          <groupId>org.apache.hive</groupId>
          <artifactId>hive-service</artifactId>
          <version>4.0.0</version>
          <exclusions>
             <exclusion>
                 <groupId>org.apache.logging.log4j</groupId>
                 <artifactId>log4j-slf4j-impl</artifactId>
             </exclusion>
             <exclusion>
                 <groupId>org.slf4j</groupId>
                 <artifactId>slf4j-reload4j</artifactId>
             </exclusion>
             <exclusion>
                 <groupId>org.apache.logging.log4j</groupId>
                 <artifactId>log4j-api</artifactId>
             </exclusion>
             <exclusion>
                 <groupId>org.antlr</groupId>
                 <artifactId>antlr4-runtime</artifactId>
             </exclusion>
             <exclusion>
                 <groupId>org.codehaus.janino</groupId>
                 <artifactId>commons-compiler</artifactId>
             </exclusion>
             <exclusion>
                 <groupId>org.apache.commons</groupId>
                 <artifactId>commons-dbcp2</artifactId>
             </exclusion>
             <exclusion>
                 <groupId>commons-io</groupId>
                 <artifactId>commons-io</artifactId>
             </exclusion>
             <exclusion>
                 <groupId>commons-lang</groupId>
                 <artifactId>commons-lang</artifactId>
             </exclusion>
             <exclusion>
                 <groupId>org.apache.commons</groupId>
                 <artifactId>commons-pool2</artifactId>
             </exclusion>
             <exclusion>
                 <groupId>org.codehaus.janino</groupId>
                 <artifactId>janino</artifactId>
             </exclusion>
             <exclusion>
                 <groupId>com.fasterxml.woodstox</groupId>
                 <artifactId>woodstox-core</artifactId>
             </exclusion>
             <exclusion>
                <groupId>org.bouncycastle</groupId>
                <artifactId>bcprov-jdk15on</artifactId>
             </exclusion>
          </exclusions>
       </dependency>
       <dependency>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-client-api</artifactId>
          <version>3.3.5</version>
       </dependency>
    </dependencies>
</project>
mode:
  type: Standalone
  repository:
    type: JDBC

dataSources:
  ds_0:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: org.apache.hive.jdbc.HiveDriver
    jdbcUrl: jdbc:hive2://localhost:10000/demo_ds_0
  ds_1:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: org.apache.hive.jdbc.HiveDriver
    jdbcUrl: jdbc:hive2://localhost:10000/demo_ds_1
  ds_2:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: org.apache.hive.jdbc.HiveDriver
    jdbcUrl: jdbc:hive2://localhost:10000/demo_ds_2

rules:
- !SHARDING
  tables:
    t_order:
      actualDataNodes:
      keyGenerateStrategy:
        column: order_id
        keyGeneratorName: snowflake
    t_order_item:
      actualDataNodes:
      keyGenerateStrategy:
        column: order_item_id
        keyGeneratorName: snowflake
  defaultDatabaseStrategy:
    standard:
      shardingColumn: user_id
      shardingAlgorithmName: inline
  shardingAlgorithms:
    inline:
      type: CLASS_BASED
      props:
        strategy: STANDARD
        algorithmClassName: org.apache.shardingsphere.test.natived.jdbc.commons.algorithm.ClassBasedInlineShardingAlgorithmFixture
  keyGenerators:
    snowflake:
      type: SNOWFLAKE
  auditors:
    sharding_key_required_auditor:
      type: DML_SHARDING_CONDITIONS

- !BROADCAST
  tables:
    - t_address

props:
  sql-show: false
/**
     * Get storage type.
     * Similar to apache/hive 4.0.0's `org.apache.hive.jdbc.HiveDatabaseMetaData`, it does not implement {@link java.sql.DatabaseMetaData#getURL()}.
     * So use {@link CatalogSwitchableDataSource#getUrl()} and {@link ReflectionUtils#getFieldValue(Object, String)} to try fuzzy matching.
     *
     * @param dataSource data source
     * @return storage type
     * @throws SQLWrapperException SQL wrapper exception
     */
    public static DatabaseType getStorageType(final DataSource dataSource) {
        try (Connection connection = dataSource.getConnection()) {
            return DatabaseTypeFactory.get(connection.getMetaData().getURL());
        } catch (final SQLFeatureNotSupportedException sqlFeatureNotSupportedException) {
            if (dataSource instanceof CatalogSwitchableDataSource) {
                return DatabaseTypeFactory.get(((CatalogSwitchableDataSource) dataSource).getUrl());
            }
            if (dataSource.getClass().getName().equals(new HikariDataSourcePoolMetaData().getType())) {
                HikariDataSourcePoolFieldMetaData dataSourcePoolFieldMetaData = new HikariDataSourcePoolFieldMetaData();
                String jdbcUrlFieldName = ReflectionUtils.<String>getFieldValue(dataSource, dataSourcePoolFieldMetaData.getJdbcUrlFieldName())
                        .orElseThrow(() -> new SQLWrapperException(sqlFeatureNotSupportedException));
                return DatabaseTypeFactory.get(jdbcUrlFieldName);
            }
            throw new SQLWrapperException(sqlFeatureNotSupportedException);
        } catch (final SQLException ex) {
            throw new SQLWrapperException(ex);
        }
    }

Simple summary

  • Support for HiveServer2 is a milestone on ShardingSphere 5.5.1 which has not yet been officially released.
  • org.apache.shardingsphere.infra.database.hive.metadata.data.loader.HiveMetaDataLoader has a known TODO. You should either delete this class before compiling the master branch of shardingsphere, or implement the TODO marked by this class.
  • The dependency management of HiveServer2 JDBC Driver is a disaster. I suggest you test it on ShardingSphere JDBC before testing ShardingSphere Proxy.

@hongjunsu
Copy link
Author

TL;DR

  • What I am going to say next may be a bit nonsense, but I do hope that friends with Hive knowledge can help deal with the problems encountered by the master branch on HiveServer2. No matter it is the master branch of ShardingSphere or the master branch of Hive.
  • We can leave ShardingSphere Proxy aside and discuss ShardingSphere JDBC first, both of which actually require the use of optional modules to parse the Hive dialect. First, compile the products of ShardingSpehre into the local repository. You need to remove org.apache.shardingsphere.infra.database.hive.metadata.data.loader.HiveMetaDataLoader as in Add GraalVM Reachability Metadata and corresponding nativeTest for HiveServer2 #31526 to prevent the creation of an embedded HiveServer2, or try to raise a PR to remove this file marked TODO. Apparently this place involves reading a string like thrift://<host_name>:<port>. Feel free to check https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration
sdk install java 22.0.1-graalce
sdk use java 22.0.1-graalce

git clone git@github.com:apache/shardingsphere.git
cd ./shardingsphere/
git reset --hard 45b69b4d0d249f31b01ce963d82debb7de751da4
./mvnw clean install -Prelease -T1C -DskipTests -Djacoco.skip=true -Dcheckstyle.skip=true -Drat.skip=true -Dmaven.javadoc.skip=true
<project>
    <dependencies>
       <dependency>
         <groupId>org.apache.shardingsphere</groupId>
         <artifactId>shardingsphere-jdbc</artifactId>
         <version>5.5.1-SNAPSHOT</version>
       </dependency>
       <dependency>
            <groupId>org.apache.shardingsphere</groupId>
            <artifactId>shardingsphere-infra-database-hive</artifactId>
            <version>5.5.1-SNAPSHOT</version>
       </dependency>
       <dependency>
          <groupId>org.apache.shardingsphere</groupId>
          <artifactId>shardingsphere-parser-sql-hive</artifactId>
          <version>5.5.1-SNAPSHOT</version>
       </dependency>
       <dependency>
          <groupId>org.apache.hive</groupId>
          <artifactId>hive-jdbc</artifactId>
          <version>4.0.0</version>
       </dependency>
       <dependency>
          <groupId>org.apache.hive</groupId>
          <artifactId>hive-service</artifactId>
          <version>4.0.0</version>
          <exclusions>
             <exclusion>
                 <groupId>org.apache.logging.log4j</groupId>
                 <artifactId>log4j-slf4j-impl</artifactId>
             </exclusion>
             <exclusion>
                 <groupId>org.slf4j</groupId>
                 <artifactId>slf4j-reload4j</artifactId>
             </exclusion>
             <exclusion>
                 <groupId>org.apache.logging.log4j</groupId>
                 <artifactId>log4j-api</artifactId>
             </exclusion>
             <exclusion>
                 <groupId>org.antlr</groupId>
                 <artifactId>antlr4-runtime</artifactId>
             </exclusion>
             <exclusion>
                 <groupId>org.codehaus.janino</groupId>
                 <artifactId>commons-compiler</artifactId>
             </exclusion>
             <exclusion>
                 <groupId>org.apache.commons</groupId>
                 <artifactId>commons-dbcp2</artifactId>
             </exclusion>
             <exclusion>
                 <groupId>commons-io</groupId>
                 <artifactId>commons-io</artifactId>
             </exclusion>
             <exclusion>
                 <groupId>commons-lang</groupId>
                 <artifactId>commons-lang</artifactId>
             </exclusion>
             <exclusion>
                 <groupId>org.apache.commons</groupId>
                 <artifactId>commons-pool2</artifactId>
             </exclusion>
             <exclusion>
                 <groupId>org.codehaus.janino</groupId>
                 <artifactId>janino</artifactId>
             </exclusion>
             <exclusion>
                 <groupId>com.fasterxml.woodstox</groupId>
                 <artifactId>woodstox-core</artifactId>
             </exclusion>
             <exclusion>
                <groupId>org.bouncycastle</groupId>
                <artifactId>bcprov-jdk15on</artifactId>
             </exclusion>
          </exclusions>
       </dependency>
       <dependency>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-client-api</artifactId>
          <version>3.3.5</version>
       </dependency>
    </dependencies>
</project>
mode:
  type: Standalone
  repository:
    type: JDBC

dataSources:
  ds_0:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: org.apache.hive.jdbc.HiveDriver
    jdbcUrl: jdbc:hive2://localhost:10000/demo_ds_0
  ds_1:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: org.apache.hive.jdbc.HiveDriver
    jdbcUrl: jdbc:hive2://localhost:10000/demo_ds_1
  ds_2:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: org.apache.hive.jdbc.HiveDriver
    jdbcUrl: jdbc:hive2://localhost:10000/demo_ds_2

rules:
- !SHARDING
  tables:
    t_order:
      actualDataNodes:
      keyGenerateStrategy:
        column: order_id
        keyGeneratorName: snowflake
    t_order_item:
      actualDataNodes:
      keyGenerateStrategy:
        column: order_item_id
        keyGeneratorName: snowflake
  defaultDatabaseStrategy:
    standard:
      shardingColumn: user_id
      shardingAlgorithmName: inline
  shardingAlgorithms:
    inline:
      type: CLASS_BASED
      props:
        strategy: STANDARD
        algorithmClassName: org.apache.shardingsphere.test.natived.jdbc.commons.algorithm.ClassBasedInlineShardingAlgorithmFixture
  keyGenerators:
    snowflake:
      type: SNOWFLAKE
  auditors:
    sharding_key_required_auditor:
      type: DML_SHARDING_CONDITIONS

- !BROADCAST
  tables:
    - t_address

props:
  sql-show: false
/**
     * Get storage type.
     * Similar to apache/hive 4.0.0's `org.apache.hive.jdbc.HiveDatabaseMetaData`, it does not implement {@link java.sql.DatabaseMetaData#getURL()}.
     * So use {@link CatalogSwitchableDataSource#getUrl()} and {@link ReflectionUtils#getFieldValue(Object, String)} to try fuzzy matching.
     *
     * @param dataSource data source
     * @return storage type
     * @throws SQLWrapperException SQL wrapper exception
     */
    public static DatabaseType getStorageType(final DataSource dataSource) {
        try (Connection connection = dataSource.getConnection()) {
            return DatabaseTypeFactory.get(connection.getMetaData().getURL());
        } catch (final SQLFeatureNotSupportedException sqlFeatureNotSupportedException) {
            if (dataSource instanceof CatalogSwitchableDataSource) {
                return DatabaseTypeFactory.get(((CatalogSwitchableDataSource) dataSource).getUrl());
            }
            if (dataSource.getClass().getName().equals(new HikariDataSourcePoolMetaData().getType())) {
                HikariDataSourcePoolFieldMetaData dataSourcePoolFieldMetaData = new HikariDataSourcePoolFieldMetaData();
                String jdbcUrlFieldName = ReflectionUtils.<String>getFieldValue(dataSource, dataSourcePoolFieldMetaData.getJdbcUrlFieldName())
                        .orElseThrow(() -> new SQLWrapperException(sqlFeatureNotSupportedException));
                return DatabaseTypeFactory.get(jdbcUrlFieldName);
            }
            throw new SQLWrapperException(sqlFeatureNotSupportedException);
        } catch (final SQLException ex) {
            throw new SQLWrapperException(ex);
        }
    }

Simple summary

  • Support for HiveServer2 is a milestone on ShardingSphere 5.5.1 which has not yet been officially released.
  • org.apache.shardingsphere.infra.database.hive.metadata.data.loader.HiveMetaDataLoader has a known TODO. You should either delete this class before compiling the master branch of shardingsphere, or implement the TODO marked by this class.
  • The dependency management of HiveServer2 JDBC Driver is a disaster. I suggest you test it on ShardingSphere JDBC before testing ShardingSphere Proxy.

thanks a lot

@hongjunsu
Copy link
Author

It still in dev

thanks a lot

@hongjunsu
Copy link
Author

Right here waitting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants