-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flag to execute Hive queries as the user submitting the query #4382
Conversation
if (HiveSessionProperties.getReadAsQueryUser(session)) { | ||
UserGroupInformation ugi = UgiUtils.getUgi(session.getUser()); | ||
try { | ||
return ugi.doAs((PrivilegedExceptionAction<ConnectorPageSource>) () -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it sufficient to only doAs
for the initial creation of the page source? Historically, we have had problems with Haooop APIs and the multithreaded nature of Presto. In a standard query this page source will be accessed from many threads, so if in the middle of the query the thread preforms a new "read", the UGI will not be set on the thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dain if the a single query accesses this method repeatedly with multiple threads we should be ok since the query will always be executed by the same user and the method will always doAs that user. We'd have problems if the ConnectorPageSource returned by this method was then shared across queries/users. Would that ever be the case?
In an initial version of this patch we ran into user/threading issues in the BackgroundHiveSplitLoader because we were doing doAs at too high a level before kicking off tasks in the threadpool. This had the effect of binding the subject to the thread across queries. The fix was to do doAs within the thread, not outside of it.
We've been successfully running this patch in production for a week or so without issue, but that doesn't mean there couldn't be a corner case that we've missed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about when we make further calls to the Hive record reader (e.g., recordReader.next
). I would assume most readers cache the open file system input stream, but I would bet there are record readers that open new files during execution (maybe reading a sidecar file), and these would fail if the UGI is not on the thread. Maybe we just ignore that until the problem actually happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I see your point. An impl that opens new files after getting the page source would run into issues. We could see wait and see if that's really an issue.
|
||
// Every instance of a UserGroupInformation object for a given user has a unique hashcode, due | ||
// to the hashCode() impl. If we don't cache the UGI per-user here, there will be a memory leak | ||
// in the PrestoFileSystemCache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the leak? Specifically, if I have a lot of users over the uptime of my system, do I have a "large" permanent allocation for every user seen, or can we "expire" them overtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this cache the specific leak would occur in PrestoFileSystemCache.map
(https://github.com/prestodb/presto-hadoop-apache2/blob/master/src/main/java/org/apache/hadoop/fs/PrestoFileSystemCache.java#L62), since the Key would always be unique, even when the same user accesses the same filesystem in multiple queries.
Yes, the caching bounds the map size to O(number of users) by hashing UGIs consistently, but as implemented PrestoFileSystemCache.map
has a "leak" in that it will cache all users keys for the duration of the JVM. Running our coordinator with 64G heap we ran into issues once we got to O(100k) map entries, to give a sense of the volume of users required to cause memory issues.
We would need to do that to support 1 - Per user temp directory: https://github.com/twitter-forks/presto/blob/twitter-master/presto-hive/src/main/java/com/facebook/presto/hive/HiveWriteUtils.java#L378 |
@billonahill: Teradata is actively working on supporting Kerberos in the Hive connector. As part of doing this, we get non-Kerberos impersonation basically for free. We have a branch in our fork that we'd love to have somebody else who has been working on this stuff take a look at. I'll open an issue and add a link to our branch there, but in the meantime, you can check it out here: Edited to add: It looks like there's an issue open already: #3380 |
Thanks for sharing that Eric, that's great. I do have a few comments, but On Thu, Jan 21, 2016 at 12:18 PM, Eric Diven notifications@github.com
|
Thanks @petroav and @ebd2, Teradata#105 looks good. We should focus on that PR instead of what I propose in this one. |
#4867 has been merged, which it sounds like addresses this. If I misunderstood please re-open! |
Adding the
hive.read-as-query-user
flag (default=false). When set to true Presto will read from HDFS as the user submitting the query. When set to false the query will read from HDFS as the presto daemon user.