Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kerberos authentification #3380

Closed
chey-jp opened this issue Jul 30, 2015 · 11 comments
Closed

kerberos authentification #3380

chey-jp opened this issue Jul 30, 2015 · 11 comments

Comments

@chey-jp
Copy link

chey-jp commented Jul 30, 2015

We are trying to use presto with hadoop and hive, using kerberos authenticaion.
There are several options, but we are unclear as to what they are currently controlling

[for example]

  • presto_cli has these options:
    --krb5-config-path, --krb5-keytab-path, --krb5-principal
  • ${presto_dir}/etc/jvm.config:
    -Dhttp.authentication.krb5.config=/${krb5.conf.dir}
    -Dhttp.authentication.krb5.credential-cache=/tmp
    -Dhttp.authentication.krb5.keytab=${keytabfiledir}

When we look at the logs from the kerberos server, we do not see any request from presto. (Currently we are using a single presto node).

Currently, does Presto support kerberos authentication when using hadoop?
If it does, is there any example I can refer to?
If it doesn't, is there any plans in the near future to implement it?

@dain
Copy link
Contributor

dain commented Jul 31, 2015

The kerberos code in Presto currently only does authentication of users when the request is over HTTPS, and if authentication fails, the user gets an error. The server currently does not perform any authorization checks.

The next step of our security work is to enable authorization checks for the tables, databases, views, etc. accessed in queries. This will likely be performed using the Hive metastore and/or Knox/Ranger.

We currently don't have plans to implement "per-user" authentication with HDFS. Instead we are planning on relying on security for SQL resources (tables/views) and using a single (superuser) credential for the Presto workers to authenticate with HDFS.

@chey-jp
Copy link
Author

chey-jp commented Aug 4, 2015

Thank you for your answer!

Yes, as you said we got the error from kerberos when we accessed via http.
So we will change it to https and I hope we can connect successfully.

By the way, when do you think the next step of your security work will be pushed to git?

Thanks in advance.

@JeffSaxe
Copy link

Thank you for all your work on Presto; initial tests on our small cluster show it to be not too painful to deploy, and very, very fast. :-) Our production Hadoop clusters have mandatory Kerberos turned on, through, so I'm glad to see this issue logged and with nearly 500 other people watching it.

Even though full Kerberos support isn't quite ready, can we (today, in version 0.114) do a tech preview / testing version of it, by configuring the last part of your email without the first? I.e., can we configure a static "superuser" credential, in the form of a principal name and keytab, that Presto Server can use to read the table files from HDFS, while ignoring authentication from the Presto client to the server? Obviously that is a gaping security weakness, but if we're willing to work in that state for a while, will it work now? If so, very briefly how do I configure it? Thanks.

@chey-jp
Copy link
Author

chey-jp commented Sep 4, 2015

Thank you for your working of presto!
Now I am trying to test via https, but it does not work.
What configuration needs to set up presto via https?
if you have some examples, will you show me?

I set up hadoop https as shown in the following URL:
http://hortonworks.com/blog/deploying-https-hdfs/

and here is the current status:
[Hadoop Lab Env]
test1: without kerberos
bin/presto --server lab.local:8080 --catalog hive --schema foo
presto> show tables
dept
presto> select * from dept;
1 record
presto> select count from dept;
1
OK
test2:with kerberos
socket time out error
test3: enable ssl
socket time out error

@damiencarol
Copy link
Contributor

Using authentication infos stored in Hive metastore would be good.

2015-07-31 20:31 GMT+02:00 Dain Sundstrom notifications@github.com:

The kerberos code in Presto currently only does authentication of users
when the request is over HTTPS, and if authentication fails, the user gets
an error. The server currently does not perform any authorization checks.

The next step of our security work is to enable authorization checks for
the tables, databases, views, etc. accessed in queries. This will likely be
performed using the Hive metastore and/or Knox/Ranger.

We currently don't have plans to implement "per-user" authentication with
HDFS. Instead we are planning on relying on security for SQL resources
(tables/views) and using a single (superuser) credential for the Presto
workers to authenticate with HDFS.


Reply to this email directly or view it on GitHub
#3380 (comment).

@JeffSaxe
Copy link

JeffSaxe commented Dec 7, 2015

Hi! I'm politely asking again about this issue, since it's been a few months. Presto clearly has a lot of excellent programmers working on it, but perhaps the issue of Kerberos ticket carry-through to Hive is not important to most users.

I really would like to deploy Presto or at least figure out how much of an improvement it would be for certain queries here. At least for a proof-of-concept, I am willing to live without actual security, i.e., I don't need Presto to actually accept a Kerberos ticket, confirm that it's valid, pull out the username, and authorize it by name against any particular list of user or tables. But if my whole Hadoop cluster is Kerberized, then the Presto server is immediately not permitted to even talk to the Thrift Metastore interface or the HDFS files without initiating those connections with Kerberos credentials. (It gets a java.net.SocketTimeoutException in org.apache.thrift.transport.TTransportException, because Thrift is expecting a Kerberos ticket in SASL and Presto is never supplying one.)

So is it possible to just give to the Presto server nodes (in their config properties somewhere) a hard-coded Kerberos identity which they would then use to access Hive metastore and files? I realize the poor security implications of this, i.e., the Presto server would be impersonating someone who was not necessarily the person running the queries from Presto CLI. Long-term, everyone would want "real" handling of Kerberos, but this would be a quick-and-dirty experiment -- we would have to firewall-restrict the Presto server port.

If no one else is working on this, I could poke at it in my spare time, but I might appreciate direction from someone familiar with the code as to which modules of source code I should start wading through. Thanks!

@electrum
Copy link
Contributor

electrum commented Dec 7, 2015

@JeffSaxe I'm not at all familiar with how Kerberos works for the Hive metastore. We connect to the Hive metastore in HiveMetastoreClientFactory which just creates a new socket for the Thrift connection. It sounds like you want to start there?

@mattsfuller
Copy link
Contributor

@JeffSaxe
We here are Teradata plan to do more with Kerberos and can take this on starting January 1.

@ebd2
Copy link
Contributor

ebd2 commented Jan 21, 2016

Teradata is working on this and has a branch that's under development and testing.
https://github.com/Teradata/presto/tree/kerberos_hive_poc

@lushuai2013
Copy link

HI, my Hadoop cluster is Kerberized, Now I user Presto to compute data, then the Presto server is immediately not permitted to even talk to the Thrift Metastore interface or the HDFS files. will it work now? If so, how do I configure it? Thanks.

@rschlussel-zz
Copy link
Member

This work is complete now. If you have questions about using Presto with Kerberos authentication you can look at the presto user group: presto-users@googlegroups.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants