-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow presto to connect to kerberized Hive clusters (v3) #4867
Conversation
package com.facebook.presto.tests.hive; | ||
|
||
/** | ||
* Created by andrii on 21.03.16. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't add comments like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is auto-generated by my IDE, and i just forgot to remove it. This is lame, i know. Will remove.
Commit "Make HiveHdfsConfiguration immutable" 61d4bb, has a typo in the commit message |
@@ -25,7 +25,20 @@ | |||
public class HiveHdfsConfiguration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@electrum, please review this commit.
@martint We've finished iterating on this PR. Could you please do the final review and merge this? p.s.: |
New version contains `hive-shims` classes within. `hive-shims` is required by presto-product-tests to access kerberized hive.
Test INSERT and SELECT paths for all the storage formats supported by hive connector. This test is going to be used for kerberized HDFS access verification. Initally all the formats are commented out because no one of them is supported yet. We will uncomment format by format together with the kerberos support implmenetation pathches. So we can better track what changes are needed in order to ensure that some particular format works for Kerberized Hadoop.
Make INITIAL_CONFIGURATION in HiveHdfsConfiguration immutable. Based on the speficic of the Configuration implementation it can be modified during the additional hadoop modules loading, such as DistributedFileSystem, MapReduce, etc. Let's consider the next flow 1. Client1 calls HdfsConfiguration.getConfiguration() 2. Client2 calls FileSystem.getFileSystem() which implicitly loads the DistributedFileSystem 3. Client3 calls HdfsConfiguration.getConfiguration() In such case Client1 and Client3 will obtain the Configuration with the different property set. In order to solve this issue we must load the hdfs related configuration during the HiveHdfsConfiguration initialization and store thoose values in unmodifiable INITIAL_CONFIGURATION.
Instead of using the reflection to acces the private methods from the UserGroupInformation we are going to leverage the thin Shim. This commit is going to be replaced with the updated versions of Hadoop libraries once they released.
Support both KERBEROS and SIMPLE hadoop authentications with impersonation and without.
Pass session user as a parameter to HdfsEnvironment.getFileSystem It is enough to just create FileSystem within the UserGroupInformation.doAs to make it authenticate the HDFS requests with Kerberos.
Use HdfsEnvironment.getFileSystem in custom readers instead of plain FileSystem.get().
Add --krb5-disable-remote-service-hostname-canonicalization presto-cli option. With this option presto service hostname canonicalization using the reverse DNS lookup can be disabled.
Add `singlenode-hdfs-impersonation`, `singlenode-kerberos-hdfs-no-impersonation` product test environments. Rename `singlenode-kerberized` environment to `singlenode-kerberos-hdfs-impersonation` to keep the names consistent. Hive connector supports 4 types of HDFS authentication. We have to be able to test them all. Very basic `singlenode` product test envrironment covers the simple hdfs authentication with no impersonation. `singlenode-hdfs-impersonation` is intended to test simple hdfs authentication with impersonation. Kerberos authentication with impersonation is covered by running product tests on `singlenode-kerberos-hdfs-impersonation` environment. In order to verify kerberos authentication without impersonation product tests must be run on `singlenode-kerberos-hdfs-no-impersonation`.
Add product tests that verify that HDFS impersonation is either enabled or disabled. To verify HDFS impersonation a table is created using the Hive connector. If HDFS impersonation is enabled table data should belong to the Presto JDBC user, otherwise it should belong to the Hadoop user defined in Presto configuration. These tests are profile specific, and can't be run simultaneously on the product tests environment. In order to exclude such tests from a regular test suite, that is being run on all the environments, the `profile_specific` test group has been introduced. This group should be explicitly excluded for the regular test runs, along with the `quarantine` and `big_query` groups. Then either the `hdfs_impersonation` or `hdfs_no_impersonation` group should be included based on the environment configuration we are going to run product tests on.
|
||
.. note:: | ||
|
||
If your ``krb5.conf`` location is different than ``/etc/krb5.conf`` you must set it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not ideal because we lose the ability to validate options that are set in this manner. I guess it's fine for now, but we should figure out a way to make the option required by the hive connector and the one required by Presto coexist.
Merged, thanks! |
Superseeds: #4576
This implementation is supposed to minimize the
Subject.doAs
performance impact when using kerberized cluster. Instead of wrapping the entire hive connector intoSubject.doAs
only concrete places ofFileSystem
creation are wrapped.Product tests for all the formats supported by Presto added. Although some obscure format which are implicitly supported by
GenericRecordReader
might potentially fail.