You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-54002][DEPLOY] Support integrating BeeLine with Connect JDBC driver
### What changes were proposed in this pull request?
This PR modifies the classpath for `bin/beeline` - excluding `spark-sql-core_*.jar`, `spark-connect_*.jar`, etc., and adding `jars/connect-repl/*.jar`, making it as same as `bin/spark-connect-shell`, the modified classpath looks like
```
jars/*.jar - except for spark-sql-core_*.jar, spark-connect_*.jar, etc.
jars/connect-repl/*.jar - including spark-connect-client-jdbc_*.jar
```
Note: BeeLine itself only requires Hive jars and a few third-party utilities jars to run, excluding some `spark-*.jar`s won't break BeeLine's existing capability for connecting to Thrift Server.
To ensure no change for classic Spark behavior, for Spark classic(default) distribution, the above changes only take effect when setting `SPARK_CONNECT_BEELINE=1` explicitly. For convenience, this is enabled by default for the Spark connect distribution
### Why are the changes needed?
It's a new feature, with this feature, users are allowed to use BeeLine as an SQL CLI to connect to the Spark Connect server.
### Does this PR introduce _any_ user-facing change?
No, this feature must be enabled by setting `SPARK_CONNECT_BEELINE=1` explicitly for classic(default) Spark distribution.
### How was this patch tested?
Launch a Connect Server first, in my case, the Connect Server (v4.1.0-preview2) runs at `sc://localhost:15002`. To ensure changes won't break the Thrift Server use case, also launch a Thrift Server at `thrift://localhost:10000`
#### Testing for dev mode
Building
```
$ build/sbt -Phive,hive-thriftserver clean package
```
Without setting `SPARK_CONNECT_BEELINE=1`, it fails as expected with `No known driver to handle "jdbc:sc://localhost:15002"`
```
$ SPARK_PREPEND_CLASSES=true bin/beeline -u jdbc:sc://localhost:15002
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
WARNING: Using incubator modules: jdk.incubator.vector
scan complete in 0ms
scan complete in 4ms
No known driver to handle "jdbc:sc://localhost:15002"
Beeline version 2.3.10 by Apache Hive
beeline>
```
With setting `SPARK_CONNECT_BEELINE=1`, it works as expected
```
$ SPARK_PREPEND_CLASSES=true SPARK_CONNECT_BEELINE=1 bin/beeline -u jdbc:sc://localhost:15002
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
WARNING: Using incubator modules: jdk.incubator.vector
Connecting to jdbc:sc://localhost:15002
Connected to: Apache Spark Connect Server (version 4.1.0-preview2)
Driver: Apache Spark Connect JDBC Driver (version 4.1.0-SNAPSHOT)
Error: Requested transaction isolation level REPEATABLE_READ is not supported (state=,code=0)
Beeline version 2.3.10 by Apache Hive
0: jdbc:sc://localhost:15002> select 'Hello, Spark Connect!', version() as server_version;
+------------------------+-------------------------------------------------+
| Hello, Spark Connect! | server_version |
+------------------------+-------------------------------------------------+
| Hello, Spark Connect! | 4.1.0 c5ff48c |
+------------------------+-------------------------------------------------+
1 row selected (0.476 seconds)
0: jdbc:sc://localhost:15002>
```
Also, test with Thrift Server to ensure no impact on existing functionalities.
It works as expected both with and without `SPARK_CONNECT_BEELINE=1`
```
$ SPARK_PREPEND_CLASSES=true [SPARK_CONNECT_BEELINE=1] bin/beeline -u jdbc:hive2://localhost:10000
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
WARNING: Using incubator modules: jdk.incubator.vector
Connecting to jdbc:hive2://localhost:10000
Connected to: Spark SQL (version 4.1.0-preview2)
Driver: Hive JDBC (version 2.3.10)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.3.10 by Apache Hive
0: jdbc:hive2://localhost:10000> select 'Hello, Spark Connect!', version() as server_version;
+------------------------+-------------------------------------------------+
| Hello, Spark Connect! | server_version |
+------------------------+-------------------------------------------------+
| Hello, Spark Connect! | 4.1.0 c5ff48c |
+------------------------+-------------------------------------------------+
1 row selected (0.973 seconds)
0: jdbc:hive2://localhost:10000>
```
#### Testing for Spark distribution
```
$ dev/make-distribution.sh --tgz --connect --name SPARK-54002 -Pyarn -Pkubernetes -Phadoop-3 -Phive -Phive-thriftserver
```
##### Spark classic distribution
```
$ tar -xzf spark-4.1.0-SNAPSHOT-bin-SPARK-54002.tgz
$ cd spark-4.1.0-SNAPSHOT-bin-SPARK-54002
$ bin/beeline -u jdbc:sc://localhost:15002 -e "select 'Hello, Spark Connect!', version() as server_version;"
... (negative result, fails with 'No known driver to handle "jdbc:sc://localhost:15002"')
$ SPARK_CONNECT_BEELINE=1 bin/beeline -u jdbc:sc://localhost:15002 -e "select 'Hello, Spark Connect!', version() as server_version;"
... (positive result)
$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select 'Hello, Spark Connect!', version() as server_version;"
... (positive result)
$ SPARK_CONNECT_BEELINE=1 bin/beeline -u jdbc:hive2://localhost:10000 -e "select 'Hello, Spark Connect!', version() as server_version;"
... (positive result)
```
##### Spark connect distribution
```
$ tar -xzf spark-4.1.0-SNAPSHOT-bin-SPARK-54002-connect.tgz
$ cd spark-4.1.0-SNAPSHOT-bin-SPARK-54002-connect
$ bin/beeline -u jdbc:sc://localhost:15002 -e "select 'Hello, Spark Connect!', version() as server_version;"
... (positive result)
$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select 'Hello, Spark Connect!', version() as server_version;"
... (positive result)
```
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes#52706 from pan3793/SPARK-54002.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
0 commit comments