Skip to content

Conversation

@deanchen
Copy link
Contributor

Obtain HBase security token with Kerberos credentials locally to be sent to executors. Tested on eBay's secure HBase cluster.

Similar to obtainTokenForNamenodes and fails gracefully if HBase classes are not included in path.

Requires hbase-site.xml to be in the classpath(typically via conf dir) for the zookeeper configuration. Should that go in the docs somewhere? Did not see an HBase section.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@deanchen
Copy link
Contributor Author

@XuTingjun noticed you were also interested in this feature on #5031

@XuTingjun
Copy link
Contributor

Yeah, LGTM, I need this function. can we put hbase's config into hbase-site.xml, right?

@deanchen
Copy link
Contributor Author

The HBaseConfiguration object will read from hbase-default.xml or hbase-site.xml in the classpath. Do you have hbase config in another file? The zookeeper configs are what is needed for obtaining the security token and should always be in hbase-site.xml so just copying that in to the Spark config dir should do the trick.

@tgravescs
Copy link
Contributor

Jenkins, test this please

@SparkQA
Copy link

SparkQA commented Apr 20, 2015

Test build #30596 has started for PR 5586 at commit aa7fab6.

@SparkQA
Copy link

SparkQA commented Apr 20, 2015

Test build #30596 has finished for PR 5586 at commit aa7fab6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30596/
Test PASSed.

@tgravescs
Copy link
Contributor

So can you detail how one actually uses this? The hive stuff can be compiled into spark, but hbase cannot be. So I assume for this to work you have to include the hbase jars. Does just specifying driver-class-path for both yarn client and cluster modes work?

did you test this on both secure and non-secure clusters?

@deanchen deanchen force-pushed the master branch 3 times, most recently from a4256cd to f48927b Compare April 27, 2015 17:00
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@tgravescs
Copy link
Contributor

ok to test

@deanchen
Copy link
Contributor Author

Yes, including the HBase jars on the driver and/or executor (eg. /usr/lib/hbase/lib/hbase-client.jar:/usr/lib/hbase/lib/hbase-common.jar:/usr/lib/hbase/lib/hbase-hadoop2-compat.jar:/usr/lib/hbase/lib/hbase-protocol.jar:/usr/lib/hbase/lib/htrace-core-2.04.jar) will allow the driver and executor to reference the hbase configuration and create a new connection. The assumption is that the hbase jars are also in those same dirs on the executors. Hbase-site.xml will need to be moved in to /conf or in to the Spark conf path since that is where the zk config for HBase is contained.

I've tested this on yarn-client and yarn-cluster on our secure production cluster with hbase 0.98 with and without the hbase jars included. And also in HDP sandbox with hbase 0.98 with a unsecured hbase connection(all running locally).

Updated the pull request to remove throw new RuntimeException on line 1117 and log as an error since users may be running a secure YARN cluster without security on HBase.

@tgravescs
Copy link
Contributor

jenkins, test this please

@SparkQA
Copy link

SparkQA commented Apr 28, 2015

Test build #31140 has started for PR 5586 at commit 0c190ef.

@SparkQA
Copy link

SparkQA commented Apr 28, 2015

Test build #31140 has finished for PR 5586 at commit 0c190ef.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@tgravescs
Copy link
Contributor

I think this looks good. It would be nice to have an example on accessing hbase for other users to reference but that is out of scope of this.

@asfgit asfgit closed this in baed3f2 Apr 29, 2015
@XuTingjun
Copy link
Contributor

My cluster information is: /opt/jdk1.8.0_40, hadoop26.0, hbase1.0.0, zookeeper 3.5.0. These days I run the select command to read data in hbase with beeline shell.It always throw the exception:

java.lang.IllegalStateException: unread block data
at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2424)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1383)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:193)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

@deanchen
Copy link
Contributor Author

@XuTingjun This looks like a generic Spark driver error when an executor crashes. Can you please dig up the executor stack trace containing the root cause?

@XuTingjun
Copy link
Contributor

@deanchen ,I use this patch, hbase throw the exception below. Can you help me?

java.io.IOException: No secret manager configured for token authentication
at org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:110)
at org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$AuthenticationService$1.getAuthenticationToken(AuthenticationProtos.java:4267)
at org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$AuthenticationService.callMethod(AuthenticationProtos.java:4387)
at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7696)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1877)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1859)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2131)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:102)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)

@deanchen
Copy link
Contributor Author

@XuTingjun Have you tried authenticating to your hbase server without Spark? Looks like a failure caused by a misconfiguration.

@XuTingjun
Copy link
Contributor

@deanchen Can you list the needed configs of hbase in client.

jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
Obtain HBase security token with Kerberos credentials locally to be sent to executors. Tested on eBay's secure HBase cluster.

Similar to obtainTokenForNamenodes and fails gracefully if HBase classes are not included in path.

Requires hbase-site.xml to be in the classpath(typically via conf dir) for the zookeeper configuration. Should that go in the docs somewhere? Did not see an HBase section.

Author: Dean Chen <deanchen5@gmail.com>

Closes apache#5586 from deanchen/master and squashes the following commits:

0c190ef [Dean Chen] [SPARK-6918][YARN] Secure HBase support.
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
Obtain HBase security token with Kerberos credentials locally to be sent to executors. Tested on eBay's secure HBase cluster.

Similar to obtainTokenForNamenodes and fails gracefully if HBase classes are not included in path.

Requires hbase-site.xml to be in the classpath(typically via conf dir) for the zookeeper configuration. Should that go in the docs somewhere? Did not see an HBase section.

Author: Dean Chen <deanchen5@gmail.com>

Closes apache#5586 from deanchen/master and squashes the following commits:

0c190ef [Dean Chen] [SPARK-6918][YARN] Secure HBase support.
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Obtain HBase security token with Kerberos credentials locally to be sent to executors. Tested on eBay's secure HBase cluster.

Similar to obtainTokenForNamenodes and fails gracefully if HBase classes are not included in path.

Requires hbase-site.xml to be in the classpath(typically via conf dir) for the zookeeper configuration. Should that go in the docs somewhere? Did not see an HBase section.

Author: Dean Chen <deanchen5@gmail.com>

Closes apache#5586 from deanchen/master and squashes the following commits:

0c190ef [Dean Chen] [SPARK-6918][YARN] Secure HBase support.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants