-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow presto to connect to kerberized Hive clusters (v2) #4576
Conversation
ff5728f
to
fc646df
Compare
24f2465
to
2457a94
Compare
I checked connectivity to Hive metastore with SASL over socks proxy and it works. |
return UserGroupInformation.getCurrentUser(); | ||
} | ||
catch (IOException e) { | ||
throw Throwables.propagate(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be categorized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IOException could be thrown only for tocken
authentication from Credentials.readTokenStorageFile
. As we are not using that type of authentication i don't think that it make sense to add special exception code for that particular issue.
Additionaly remove extra HiveConf initialization.
…tications Probably we should rename HdfsConfiguration at some step. But anyway it is better to have Configuration factory code in one single place.
Add proxy UserGroupInformation cache to impersonating implementations of HadoopAuthentication. If no caching was used and new proxy UGI was created on each call to getUserGroupInformation(String) HDFS Filesystem mechanism did not work. Different proxy UGI are not equal in terms of equals() method even if they represent same user. For this reason new entry was created in HDFS Filesystem cache for each new proxy user.
This patch allows configuring where temporary files are created in hdfs during INSERT/CREATE TABLE AS SELECT flow via hive connector. New configuration property 'hive.hdfs.temporary.directory' was added. If %USER% appears in value of property it is replaced with id of user currently executing query.
2457a94
to
9623e5a
Compare
@electrum rebased onto actual master + comments addressed. |
When we benchmarked the solution we noticed great performance degradation when authenticating wrappers are in use. Simple queries on ORC table were executing 3x slower than without wrappers. Solution 1 We have a working solution for that but it is somewhat hacky. It depends on patching UserGroupInformation in hadoop library. These are PR for 3 versions of hadoop lib:
The source of performance degradation is fact that The change we made exploits the fact that call to actual The change we made is basically extending In Presto's authenticating wrappers we use faster versions of See this commit for actual diff: prestodb/presto-hadoop-apache2@240a81d This is not a clean solution, as it patches external library and depends on brittle code flow assumptions. Solution 2 After implementing this we also got another idea. It does not involve patching Hadoop lib but have other drawbacks. Let's focus on
Similar mechanism can be implemented for The drawback of this approach is that we are using more memory (buffer of pages), and we process data in less streamlined manner. @electrum, @dain. What do you think about implemented solution? We would really appreciate input here. Any idea how to solve the problem here in a cleaner way would be very beneficial. cc: @arhimondr, @pnowojski, @ilfrin |
This won't work because we return lazy blocks. With ORC, we need to know which column values are needed before the next page is fetched, because when fetching a page we advance all streams. In general, I think the solution to this is that we pass the UGI to the |
private UserGroupInformation getCurrentHdfsUser() | ||
{ | ||
try { | ||
return UserGroupInformation.getCurrentUser(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UGI should be explicitly passed in during the construction, so we should not need thread tricks here.
IIRC, when we talked, we decided that we were going to do more work on this. Let me know when that is done and I'll take another look. |
Summary: * Change include file to be compatible with Velox after following PR * Advance Velox Version X-link: facebookincubator/velox#4576 Reviewed By: amitkdutta Differential Revision: D44875623 Pulled By: tanjialiang fbshipit-source-id: d9dcab82ac64b0601daf7cb35324ac951d936607
Summary: * Change include file to be compatible with Velox after following PR * Advance Velox Version X-link: facebookincubator/velox#4576 Reviewed By: amitkdutta Differential Revision: D44875623 Pulled By: tanjialiang fbshipit-source-id: d9dcab82ac64b0601daf7cb35324ac951d936607
Summary: * Change include file to be compatible with Velox after following PR * Advance Velox Version X-link: facebookincubator/velox#4576 Reviewed By: amitkdutta Differential Revision: D44875623 Pulled By: tanjialiang fbshipit-source-id: d9dcab82ac64b0601daf7cb35324ac951d936607
Summary: * Change include file to be compatible with Velox after following PR * Advance Velox Version X-link: facebookincubator/velox#4576 Reviewed By: amitkdutta Differential Revision: D44875623 Pulled By: tanjialiang fbshipit-source-id: d9dcab82ac64b0601daf7cb35324ac951d936607 (cherry picked from commit 78104b4)
Summary: * Change include file to be compatible with Velox after following PR * Advance Velox Version X-link: facebookincubator/velox#4576 Reviewed By: amitkdutta Differential Revision: D44875623 Pulled By: tanjialiang fbshipit-source-id: d9dcab82ac64b0601daf7cb35324ac951d936607
No description provided.