Skip to content

Commit efd066e

Browse files
pan3793dongjoon-hyun
authored andcommitted
[SPARK-54002][DEPLOY] Support integrating BeeLine with Connect JDBC driver
### What changes were proposed in this pull request? This PR modifies the classpath for `bin/beeline` - excluding `spark-sql-core_*.jar`, `spark-connect_*.jar`, etc., and adding `jars/connect-repl/*.jar`, making it as same as `bin/spark-connect-shell`, the modified classpath looks like ``` jars/*.jar - except for spark-sql-core_*.jar, spark-connect_*.jar, etc. jars/connect-repl/*.jar - including spark-connect-client-jdbc_*.jar ``` Note: BeeLine itself only requires Hive jars and a few third-party utilities jars to run, excluding some `spark-*.jar`s won't break BeeLine's existing capability for connecting to Thrift Server. To ensure no change for classic Spark behavior, for Spark classic(default) distribution, the above changes only take effect when setting `SPARK_CONNECT_BEELINE=1` explicitly. For convenience, this is enabled by default for the Spark connect distribution ### Why are the changes needed? It's a new feature, with this feature, users are allowed to use BeeLine as an SQL CLI to connect to the Spark Connect server. ### Does this PR introduce _any_ user-facing change? No, this feature must be enabled by setting `SPARK_CONNECT_BEELINE=1` explicitly for classic(default) Spark distribution. ### How was this patch tested? Launch a Connect Server first, in my case, the Connect Server (v4.1.0-preview2) runs at `sc://localhost:15002`. To ensure changes won't break the Thrift Server use case, also launch a Thrift Server at `thrift://localhost:10000` #### Testing for dev mode Building ``` $ build/sbt -Phive,hive-thriftserver clean package ``` Without setting `SPARK_CONNECT_BEELINE=1`, it fails as expected with `No known driver to handle "jdbc:sc://localhost:15002"` ``` $ SPARK_PREPEND_CLASSES=true bin/beeline -u jdbc:sc://localhost:15002 NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly. WARNING: Using incubator modules: jdk.incubator.vector scan complete in 0ms scan complete in 4ms No known driver to handle "jdbc:sc://localhost:15002" Beeline version 2.3.10 by Apache Hive beeline> ``` With setting `SPARK_CONNECT_BEELINE=1`, it works as expected ``` $ SPARK_PREPEND_CLASSES=true SPARK_CONNECT_BEELINE=1 bin/beeline -u jdbc:sc://localhost:15002 NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly. WARNING: Using incubator modules: jdk.incubator.vector Connecting to jdbc:sc://localhost:15002 Connected to: Apache Spark Connect Server (version 4.1.0-preview2) Driver: Apache Spark Connect JDBC Driver (version 4.1.0-SNAPSHOT) Error: Requested transaction isolation level REPEATABLE_READ is not supported (state=,code=0) Beeline version 2.3.10 by Apache Hive 0: jdbc:sc://localhost:15002> select 'Hello, Spark Connect!', version() as server_version; +------------------------+-------------------------------------------------+ | Hello, Spark Connect! | server_version | +------------------------+-------------------------------------------------+ | Hello, Spark Connect! | 4.1.0 c5ff48c | +------------------------+-------------------------------------------------+ 1 row selected (0.476 seconds) 0: jdbc:sc://localhost:15002> ``` Also, test with Thrift Server to ensure no impact on existing functionalities. It works as expected both with and without `SPARK_CONNECT_BEELINE=1` ``` $ SPARK_PREPEND_CLASSES=true [SPARK_CONNECT_BEELINE=1] bin/beeline -u jdbc:hive2://localhost:10000 NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly. WARNING: Using incubator modules: jdk.incubator.vector Connecting to jdbc:hive2://localhost:10000 Connected to: Spark SQL (version 4.1.0-preview2) Driver: Hive JDBC (version 2.3.10) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 2.3.10 by Apache Hive 0: jdbc:hive2://localhost:10000> select 'Hello, Spark Connect!', version() as server_version; +------------------------+-------------------------------------------------+ | Hello, Spark Connect! | server_version | +------------------------+-------------------------------------------------+ | Hello, Spark Connect! | 4.1.0 c5ff48c | +------------------------+-------------------------------------------------+ 1 row selected (0.973 seconds) 0: jdbc:hive2://localhost:10000> ``` #### Testing for Spark distribution ``` $ dev/make-distribution.sh --tgz --connect --name SPARK-54002 -Pyarn -Pkubernetes -Phadoop-3 -Phive -Phive-thriftserver ``` ##### Spark classic distribution ``` $ tar -xzf spark-4.1.0-SNAPSHOT-bin-SPARK-54002.tgz $ cd spark-4.1.0-SNAPSHOT-bin-SPARK-54002 $ bin/beeline -u jdbc:sc://localhost:15002 -e "select 'Hello, Spark Connect!', version() as server_version;" ... (negative result, fails with 'No known driver to handle "jdbc:sc://localhost:15002"') $ SPARK_CONNECT_BEELINE=1 bin/beeline -u jdbc:sc://localhost:15002 -e "select 'Hello, Spark Connect!', version() as server_version;" ... (positive result) $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select 'Hello, Spark Connect!', version() as server_version;" ... (positive result) $ SPARK_CONNECT_BEELINE=1 bin/beeline -u jdbc:hive2://localhost:10000 -e "select 'Hello, Spark Connect!', version() as server_version;" ... (positive result) ``` ##### Spark connect distribution ``` $ tar -xzf spark-4.1.0-SNAPSHOT-bin-SPARK-54002-connect.tgz $ cd spark-4.1.0-SNAPSHOT-bin-SPARK-54002-connect $ bin/beeline -u jdbc:sc://localhost:15002 -e "select 'Hello, Spark Connect!', version() as server_version;" ... (positive result) $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select 'Hello, Spark Connect!', version() as server_version;" ... (positive result) ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52706 from pan3793/SPARK-54002. Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
1 parent 665a428 commit efd066e

File tree

3 files changed

+17
-6
lines changed

3 files changed

+17
-6
lines changed

dev/make-distribution.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -322,9 +322,11 @@ if [ "$MAKE_TGZ" == "true" ]; then
322322
rm -rf "$TARDIR"
323323
cp -r "$DISTDIR" "$TARDIR"
324324
# Set the Spark Connect system variable in these scripts to enable it by default.
325+
awk 'NR==1{print; print "export SPARK_CONNECT_BEELINE=${SPARK_CONNECT_BEELINE:-1}"; next} {print}' "$TARDIR/bin/beeline" > tmp && cat tmp > "$TARDIR/bin/beeline"
325326
awk 'NR==1{print; print "export SPARK_CONNECT_MODE=${SPARK_CONNECT_MODE:-1}"; next} {print}' "$TARDIR/bin/pyspark" > tmp && cat tmp > "$TARDIR/bin/pyspark"
326327
awk 'NR==1{print; print "export SPARK_CONNECT_MODE=${SPARK_CONNECT_MODE:-1}"; next} {print}' "$TARDIR/bin/spark-shell" > tmp && cat tmp > "$TARDIR/bin/spark-shell"
327328
awk 'NR==1{print; print "export SPARK_CONNECT_MODE=${SPARK_CONNECT_MODE:-1}"; next} {print}' "$TARDIR/bin/spark-submit" > tmp && cat tmp > "$TARDIR/bin/spark-submit"
329+
awk 'NR==1{print; print "if [%SPARK_CONNECT_BEELINE%] == [] set SPARK_CONNECT_BEELINE=1"; next} {print}' "$TARDIR/bin/beeline.cmd" > tmp && cat tmp > "$TARDIR/bin/beeline.cmd"
328330
awk 'NR==1{print; print "if [%SPARK_CONNECT_MODE%] == [] set SPARK_CONNECT_MODE=1"; next} {print}' "$TARDIR/bin/pyspark2.cmd" > tmp && cat tmp > "$TARDIR/bin/pyspark2.cmd"
329331
awk 'NR==1{print; print "if [%SPARK_CONNECT_MODE%] == [] set SPARK_CONNECT_MODE=1"; next} {print}' "$TARDIR/bin/spark-shell2.cmd" > tmp && cat tmp > "$TARDIR/bin/spark-shell2.cmd"
330332
awk 'NR==1{print; print "if [%SPARK_CONNECT_MODE%] == [] set SPARK_CONNECT_MODE=1"; next} {print}' "$TARDIR/bin/spark-submit2.cmd" > tmp && cat tmp > "$TARDIR/bin/spark-submit2.cmd"

launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,8 @@ abstract class AbstractCommandBuilder {
6666
*/
6767
protected boolean isRemote = System.getenv().containsKey("SPARK_REMOTE");
6868

69+
protected boolean isBeeLine = false;
70+
6971
AbstractCommandBuilder() {
7072
this.appArgs = new ArrayList<>();
7173
this.childEnv = new HashMap<>();
@@ -195,6 +197,10 @@ List<String> buildClassPath(String appClassPath) throws IOException {
195197
if (isRemote && "1".equals(getenv("SPARK_SCALA_SHELL")) && project.equals("sql/core")) {
196198
continue;
197199
}
200+
if (isBeeLine && "1".equals(getenv("SPARK_CONNECT_BEELINE")) &&
201+
project.equals("sql/core")) {
202+
continue;
203+
}
198204
// SPARK-49534: The assumption here is that if `spark-hive_xxx.jar` is not in the
199205
// classpath, then the `-Phive` profile was not used during package, and therefore
200206
// the Hive-related jars should also not be in the classpath. To avoid failure in
@@ -241,13 +247,13 @@ List<String> buildClassPath(String appClassPath) throws IOException {
241247
}
242248
}
243249

244-
if (isRemote) {
250+
if (isRemote || (isBeeLine && "1".equals(getenv("SPARK_CONNECT_BEELINE")))) {
245251
for (File f: new File(jarsDir).listFiles()) {
246-
// Exclude Spark Classic SQL and Spark Connect server jars
247-
// if we're in Spark Connect Shell. Also exclude Spark SQL API and
248-
// Spark Connect Common which Spark Connect client shades.
249-
// Then, we add the Spark Connect shell and its dependencies in connect-repl
250-
// See also SPARK-48936.
252+
// Exclude Spark Classic SQL and Spark Connect server jars if we're in
253+
// Spark Connect Shell or BeeLine with Connect JDBC driver. Also exclude
254+
// Spark SQL API and Spark Connect Common which Spark Connect client shades.
255+
// Then, we add the Spark Connect shell and its dependencies in connect-repl.
256+
// See also SPARK-48936, SPARK-54002.
251257
if (f.isDirectory() && f.getName().equals("connect-repl")) {
252258
addToClassPath(cp, join(File.separator, f.toString(), "*"));
253259
} else if (

launcher/src/main/java/org/apache/spark/launcher/SparkClassCommandBuilder.java

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,9 @@ class SparkClassCommandBuilder extends AbstractCommandBuilder {
3838
SparkClassCommandBuilder(String className, List<String> classArgs) {
3939
this.className = className;
4040
this.classArgs = classArgs;
41+
if ("org.apache.hive.beeline.BeeLine".equals(className)) {
42+
this.isBeeLine = true;
43+
}
4144
}
4245

4346
@Override

0 commit comments

Comments
 (0)