Skip to content

Conversation

@dougb
Copy link

@dougb dougb commented Mar 15, 2015

Adds hive2-metastore delegation token to conf when running in secure mode.
Without this change, running on YARN in cluster mode fails with a
GSS exception.

This is a rough patch that adds a dependency to spark/yarn on hive-exec.
I'm looking for suggestions on how to make this patch better.

This contribution is my original work and that I licenses the work to the
Apache Spark project under the project's open source licenses.

Author: Doug Balog doug.balog@target.com

Adds hive2-metastore delagations token to conf when running in securemode.
Without this change, runing on YARN in cluster mode fails with a
GSS exception.

This contribution is my original work and that I licenses the work to the
Apache Spark project under the project's open source licenses.

Author: Doug Balog <doug.balog@target.com>
@tgravescs
Copy link
Contributor

jenkins test this please

@tgravescs
Copy link
Contributor

thanks for providing the patch. Which versions of Hive have you tested this with?

I'm trying to build with a hive 13 version (not the official 0.13.1 though) and see a compile error:
java.lang.AssertionError: assertion failed: org.apache.hadoop.hive.metastore.api.AlreadyExistsException

We will have to find a way to conditionally do the Hive stuff as its optional to compile it in. I'll have to look at how the hive stuff is done now a bit more to figure out what makes sense.

adding @marmbrus to see if he has ideas as I think his familiar with how the hive stuff is done now.

@dougb
Copy link
Author

dougb commented Mar 18, 2015

Thanks for looking at my patch.
I've tested it only with 0.13.1 from org.spark-project.hive.
I tried to figure out how hive was included via profiles, but gave up.
@marmbrus, any suggestions would be appreciated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think the /* And Hive is enabled */ part is kinda important, and not talking just about Hive being compiled in. What would happen if a user doesn't have any Hive services running?

I'm also a little worried about precedent here... couldn't we make the same argument for acquiring HBase tokens? Solr? Somthing else? Unfortunately, I can't think of any alternative off the bat, given the way cluster mode works... :-/

@XuTingjun
Copy link
Contributor

I have tested on secure hbase, and it didn't work. On executor process we got the error:

2015-03-25 11:11:33,426 | DEBUG | [Executor task launch worker-0] | Creating SASL GSSAPI client. Server's Kerberos principal name is hbase/dc1-rack1-host1@HADOOP.COM | org.apache.hadoop.hbase.security.HBaseSaslRpcClient.(HBaseSaslRpcClient.java:109)
2015-03-25 11:11:33,431 | DEBUG | [Executor task launch worker-0] | PrivilegedActionException as:spark (auth:SIMPLE) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] | org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1616)
2015-03-25 11:11:33,433 | DEBUG | [Executor task launch worker-0] | PrivilegedAction as:spark (auth:SIMPLE) from:org.apache.hadoop.hbase.ipc.RpcClient$Connection.handleSaslConnectionFailure(RpcClient.java:796) | org.apache.hadoop.security.UserGroupInformation.logPrivilegedAction(UserGroupInformation.java:1636)
2015-03-25 11:11:33,434 | WARN | [Executor task launch worker-0] | Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] | org.apache.hadoop.hbase.ipc.RpcClient$Connection$1.run(RpcClient.java:824)
2015-03-25 11:11:33,434 | ERROR | [Executor task launch worker-0] | SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'. | org.apache.hadoop.hbase.ipc.RpcClient$Connection$1.run(RpcClient.java:834)
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:177)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:770)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$600(RpcClient.java:357)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:891)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:888)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1612)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:888)
at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.execService(ClientProtos.java:29924)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.execService(ProtobufUtil.java:1580)
at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:93)

@tgravescs
Copy link
Contributor

@doubg I'm just curious, are you still working on this? I think a version of this could go in to atleast allow it to work for the 7 days then Hari's change could improve.

@dougb
Copy link
Author

dougb commented Apr 7, 2015

@tgravescs yes, Sorry, I'll update the pr today. I have a new version using reflection that works in my environment with Hive 0.14.

@dougb
Copy link
Author

dougb commented Apr 7, 2015

@XuTingjun Sorry, this patch is for Hive, not hbase. I'm sure something similar could be created for hbase.

Doug Balog added 2 commits April 7, 2015 15:35
@tgravescs
Copy link
Contributor

ok to test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add space between if and (

@tgravescs
Copy link
Contributor

A few nits but other then that it looks fine. I"m going to try it out.

Did you test this with hive compiled in and then without it also?

@dougb
Copy link
Author

dougb commented Apr 10, 2015

Updated code per @tgravescs comments.
I tested with and without hive compiled in and I didn't see any problems.

@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@SparkQA
Copy link

SparkQA commented Apr 12, 2015

Test build #30100 has finished for PR 5031 at commit 3e9ac16.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@dougb
Copy link
Author

dougb commented Apr 12, 2015

The Scalastyle checks that failed had nothing to do with this pr.
This pr only changes yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala

=========================================================================
Running Scala style checks
=========================================================================
Scalastyle checks failed at following occurrences:
[error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:172:8: Public method must have explicit type
[error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala:159:15: Public method must have explicit type
[error] (catalyst/compile:scalastyle) errors exist
[error] Total time: 7 s, completed Apr 11, 2015 7:34:10 PM
[error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:172:8: Public method must have explicit type
[error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala:159:15: Public method must have explicit type
[error] (catalyst/compile:scalastyle) errors exist
[error] Total time: 6 s, completed Apr 11, 2015 7:34:27 PM
[error] Got a return code of 1 on line 125 of the run-tests script.
Archiving unit tests logs...

@JoshRosen
Copy link
Contributor

I've pushed a hotfix to fix those style errors, so this should hopefully be able to test now. Let's try again...

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Apr 12, 2015

Test build #30101 has finished for PR 5031 at commit 3e9ac16.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@tgravescs
Copy link
Contributor

lgtm.

@asfgit asfgit closed this in 77620be Apr 13, 2015
xieyuchen pushed a commit to xieyuchen/spark that referenced this pull request May 6, 2015
Adds hive2-metastore delegation token to conf when running in secure mode.
Without this change, running on YARN in cluster mode fails with a
GSS exception.

This is a rough patch that adds a dependency to spark/yarn on hive-exec.
I'm looking for suggestions on how to make this patch better.

This contribution is my original work and that I licenses the work to the
Apache Spark project under the project's open source licenses.

Author: Doug Balog <doug.balogtarget.com>

Author: Doug Balog <doug.balog@target.com>

Closes apache#5031 from dougb/SPARK-6207 and squashes the following commits:

3e9ac16 [Doug Balog] [SPARK-6207] Fixes minor code spacing issues.
e260765 [Doug Balog] [SPARK-6207] Second pass at adding Hive delegation token to conf. - Use reflection instead of adding dependency on hive. - Tested on Hive 0.13 and Hadoop 2.4.1
1ab1729 [Doug Balog] Merge branch 'master' of git://github.com/apache/spark into SPARK-6207
bf356d2 [Doug Balog] [SPARK-6207] [YARN] [SQL] Adds delegation tokens for metastore to conf. Adds hive2-metastore delagations token to conf when running in securemode. Without this change, runing on YARN in cluster mode fails with a GSS exception.
dougb pushed a commit to dougb/spark that referenced this pull request May 13, 2015
Adds hive2-metastore delegation token to conf when running in secure mode.
Without this change, running on YARN in cluster mode fails with a
GSS exception.

This is a rough patch that adds a dependency to spark/yarn on hive-exec.
I'm looking for suggestions on how to make this patch better.

This contribution is my original work and that I licenses the work to the
Apache Spark project under the project's open source licenses.

Author: Doug Balog <doug.balogtarget.com>

Author: Doug Balog <doug.balog@target.com>

Closes apache#5031 from dougb/SPARK-6207 and squashes the following commits:

3e9ac16 [Doug Balog] [SPARK-6207] Fixes minor code spacing issues.
e260765 [Doug Balog] [SPARK-6207] Second pass at adding Hive delegation token to conf. - Use reflection instead of adding dependency on hive. - Tested on Hive 0.13 and Hadoop 2.4.1
1ab1729 [Doug Balog] Merge branch 'master' of git://github.com/apache/spark into SPARK-6207
bf356d2 [Doug Balog] [SPARK-6207] [YARN] [SQL] Adds delegation tokens for metastore to conf. Adds hive2-metastore delagations token to conf when running in securemode. Without this change, runing on YARN in cluster mode fails with a GSS exception.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants