-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24349][SQL] Ignore setting token if using JDBC #21396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @vanzin @jerryshao , could you help to review this? |
|
Can one of the admins verify this patch? |
| require(principal.nonEmpty, s"Hive principal $principalKey undefined") | ||
| val metastoreUri = conf.getTrimmed("hive.metastore.uris", "") | ||
| require(metastoreUri.nonEmpty, "Hive metastore uri undefined") | ||
| if (metastoreUri.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the code getting past the check in delegationTokensRequired? It's basically checking for the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require() will throws IllegalArgumentException and exits the JVM here. Letting delegationTokensRequired return when metastoreUri is undefined (using JDBC to connect DB directly) only finishes this method (not set token in Credentials).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as @vanzin , delegationTokensRequired should already check whether hive.metastore.uris is empty or not, so it will not obtain the DT if this hive.metastore.uris is not configured.
override def delegationTokensRequired(
sparkConf: SparkConf,
hadoopConf: Configuration): Boolean = {
// Delegation tokens are needed only when:
// - trying to connect to a secure metastore
// - either deploying in cluster mode without a keytab, or impersonating another user
//
// Other modes (such as client with or without keytab, or cluster mode with keytab) do not need
// a delegation token, since there's a valid kerberos TGT for the right user available to the
// driver, which is the only process that connects to the HMS.
val deployMode = sparkConf.get("spark.submit.deployMode", "client")
UserGroupInformation.isSecurityEnabled &&
hiveConf(hadoopConf).getTrimmed("hive.metastore.uris", "").nonEmpty &&
(SparkHadoopUtil.get.isProxyUser(UserGroupInformation.getCurrentUser()) ||
(deployMode == "cluster" && !sparkConf.contains(KEYTAB)))
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know. If the metastore is undefined, it no needs to obtain the DT. Am I right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before getting the DT, we will check if delegationTokensRequired returns true or false, if it is false, then we will not get DT. Here since "hive.metastore.uris" is not configured, then it should return false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, my fault, delegationTokensRequired has been checked in SparkSQLCLIDriver.scala.
|
#20784 and #21343 did the same thing, but #21343 is much readable. They are all to fix the problem using proxy user to access metastore (#17335 only considers the yarn mode). However, if current settings is connecting to DB directly instead of RPC to metastore, we shouldn't block the spark job execution after #20784 |
|
Can you please describe your scenario @LantaoJin ? |
|
@jerryshao And why not access metastore is tricky, that's because the firewall issue between spark and metastore. It should be resolved in our side. But in the code path, we still can chose whether or not enable metastore, and after #20784 , the approach of DB direct-connect was blocked. |
|
In our current settings, when we onboard a new cluster, the default is connect to DB directly, it's much simpler than access metastore. And we are going to update to access metastore by default. But I think spark shouldn't block that old approach. |
What changes were proposed in this pull request?
In SPARK-23639, use --proxy-user to impersonate will invoke obtainDelegationTokens(). But from that, if current settings is connecting to DB directly via JDBC instead of RPC with metastore, it will failed with
How was this patch tested?
Remove or comment out the configuration hive.metastore.uris in hive-site.xml (Using JDBC to connect DB directly)
Below command will failed: