[SPARK-24349][SQL] Ignore setting token if using JDBC #21396

LantaoJin · 2018-05-22T14:58:26Z

What changes were proposed in this pull request?

In SPARK-23639, use --proxy-user to impersonate will invoke obtainDelegationTokens(). But from that, if current settings is connecting to DB directly via JDBC instead of RPC with metastore, it will failed with

WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Hive metastore uri undefined
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.sql.hive.thriftserver.HiveCredentialProvider.obtainCredentials(HiveCredentialProvider.scala:73)
at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:56)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:288)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:137)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/05/22 05:24:16 INFO ShutdownHookManager: Shutdown hook called
18/05/22 05:24:16 INFO ShutdownHookManager: Deleting directory /tmp/spark-b63ad788-1a47-4326-9972-c4fde1dc19c3
{code}

How was this patch tested?

Remove or comment out the configuration hive.metastore.uris in hive-site.xml (Using JDBC to connect DB directly)
Below command will failed:

bin/spark-sql --proxy-user x_user --master local

LantaoJin · 2018-05-22T15:00:50Z

Hi @vanzin @jerryshao , could you help to review this?

AmplabJenkins · 2018-05-22T15:02:48Z

Can one of the admins verify this patch?

vanzin · 2018-05-22T16:48:56Z

core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala

      require(principal.nonEmpty, s"Hive principal $principalKey undefined")
      val metastoreUri = conf.getTrimmed("hive.metastore.uris", "")
-      require(metastoreUri.nonEmpty, "Hive metastore uri undefined")
+      if (metastoreUri.isEmpty) {


How is the code getting past the check in delegationTokensRequired? It's basically checking for the same thing.

require() will throws IllegalArgumentException and exits the JVM here. Letting delegationTokensRequired return when metastoreUri is undefined (using JDBC to connect DB directly) only finishes this method (not set token in Credentials).

Same question as @vanzin , delegationTokensRequired should already check whether hive.metastore.uris is empty or not, so it will not obtain the DT if this hive.metastore.uris is not configured.

override def delegationTokensRequired( sparkConf: SparkConf, hadoopConf: Configuration): Boolean = { // Delegation tokens are needed only when: // - trying to connect to a secure metastore // - either deploying in cluster mode without a keytab, or impersonating another user // // Other modes (such as client with or without keytab, or cluster mode with keytab) do not need // a delegation token, since there's a valid kerberos TGT for the right user available to the // driver, which is the only process that connects to the HMS. val deployMode = sparkConf.get("spark.submit.deployMode", "client") UserGroupInformation.isSecurityEnabled && hiveConf(hadoopConf).getTrimmed("hive.metastore.uris", "").nonEmpty && (SparkHadoopUtil.get.isProxyUser(UserGroupInformation.getCurrentUser()) || (deployMode == "cluster" && !sparkConf.contains(KEYTAB))) }

Yes, I know. If the metastore is undefined, it no needs to obtain the DT. Am I right?

Before getting the DT, we will check if delegationTokensRequired returns true or false, if it is false, then we will not get DT. Here since "hive.metastore.uris" is not configured, then it should return false.

Oh, my fault, delegationTokensRequired has been checked in SparkSQLCLIDriver.scala.

LantaoJin · 2018-05-23T01:35:33Z

#20784 and #21343 did the same thing, but #21343 is much readable. They are all to fix the problem using proxy user to access metastore (#17335 only considers the yarn mode). However, if current settings is connecting to DB directly instead of RPC to metastore, we shouldn't block the spark job execution after #20784

jerryshao · 2018-05-23T02:21:14Z

Can you please describe your scenario @LantaoJin ?

LantaoJin · 2018-05-23T02:41:25Z

@jerryshao
Simply speaking, in a security environment, if we use JDBC to connect to mysql directly instead of accessing hive metastore, current implementation blocks job execution.

And why not access metastore is tricky, that's because the firewall issue between spark and metastore. It should be resolved in our side. But in the code path, we still can chose whether or not enable metastore, and after #20784 , the approach of DB direct-connect was blocked.

LantaoJin · 2018-05-23T02:46:57Z

Also, why still needs #20784 or #21343 to extends to #17335 may be caused by:

Some DDL operations in local mode are much faster than launching a AM in yarn.
Nodes in YARN cluster have the firewall issue with metastore :)

LantaoJin · 2018-05-23T03:01:14Z

In our current settings, when we onboard a new cluster, the default is connect to DB directly, it's much simpler than access metastore. And we are going to update to access metastore by default. But I think spark shouldn't block that old approach.

[SPARK-24349][SQL] Ignore setting token if using JDBC

eabf1b3

vanzin reviewed May 22, 2018

View reviewed changes

LantaoJin closed this May 23, 2018

[SPARK-24349][SQL] Ignore setting token if using JDBC #21396

[SPARK-24349][SQL] Ignore setting token if using JDBC #21396

Uh oh!

Conversation

LantaoJin commented May 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

LantaoJin commented May 22, 2018

Uh oh!

AmplabJenkins commented May 22, 2018

Uh oh!

vanzin May 22, 2018

Choose a reason for hiding this comment

Uh oh!

LantaoJin May 23, 2018

Choose a reason for hiding this comment

Uh oh!

jerryshao May 23, 2018

Choose a reason for hiding this comment

Uh oh!

LantaoJin May 23, 2018

Choose a reason for hiding this comment

Uh oh!

jerryshao May 23, 2018

Choose a reason for hiding this comment

Uh oh!

LantaoJin May 23, 2018

Choose a reason for hiding this comment

Uh oh!

LantaoJin commented May 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerryshao commented May 23, 2018

Uh oh!

LantaoJin commented May 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LantaoJin commented May 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LantaoJin commented May 23, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

LantaoJin commented May 22, 2018 •

edited

Loading

LantaoJin commented May 23, 2018 •

edited

Loading

LantaoJin commented May 23, 2018 •

edited

Loading

LantaoJin commented May 23, 2018 •

edited

Loading