This is a simple WebDAV servlet, based on Apache Jackrabbit, derived from the iponweb HDFS over WebDAV project.
Apache Jackrabbit: http://jackrabbit.apache.org/
iponweb HDFS: http://www.hadoop.iponweb.net/Home/hdfs-over-webdav
To deploy:
-
Create the UNIX user that will run the WebDAV server, e.g. "webdav".
-
Create per-host Kerberos credentials for this service user and create per host keytab files. You will also need to add a per host Kerberos credential for "HTTP" to the keytab. (This is needed by the SPNEGO authentication filter, per the SPNEGO spec.)
-
Edit the Hadoop core-site.xml and define the Unix user that will run the WebDAV server as a proxyuser. For example, if the user is 'webdav':
... <property> <name>hadoop.proxyuser.webdav.hosts</name> <value>host1,host2,hostN</value> </property> <property> <name>hadoop.proxyuser.webdav.groups</name> <value>*</value> </property> ...
This configuration must be done on the NameNode and then the NameNode must be restarted.
-
Untar the WebDAV gateway package onto the hosts defined in hadoop.proxyuser.webdav.hosts. Symlink core-site.xml and hdfs-site.xml from the Hadoop configuration into its conf/ directory.
-
Deploy the per-host service keytab into the conf/ directory.
-
Edit the webdav-site.xml file in its conf/ directory as needed:
For authenticating the WebDAV gateway securely with Kerberos on a security-enabled Hadoop cluser but allowing clients anonymous access:
<property> <name>hadoop.webdav.bind.address</name> <value>0.0.0.0</value> </property> <property> <name>hadoop.webdav.port</name> <value>8080</value> </property> <property> <name>hadoop.webdav.server.kerberos.principal</name> <value>webdav/_HOST@HADOOP.LOCALDOMAIN</value> </property> <property> <name>hadoop.webdav.server.kerberos.keytab</name> <value>/path/to/webdav.keytab</value> </property> <property> <name>hadoop.webdav.authentication.type</name> <value>simple</value> </property> <property> <name>hadoop.webdav.authentication.simple.anonymous.allowed</name> <value>true</value> </property>
NOTE: Allowing 'simple' authentication is a huge security hole if user impersonation is also allowed. If running with 'simple' authentication in production, REVERT the changes made in step 3, and carefully restrict the HDFS permissions of data so the 'webdav' principal has read-write access only in controlled locations.
For authenticating both the WebDAV gateway and clients (via SPNEGO):
<property> <name>hadoop.webdav.bind.address</name> <value>0.0.0.0</value> </property> <property> <name>hadoop.webdav.port</name> <value>8080</value> </property> <property> <name>hadoop.webdav.server.kerberos.principal</name> <value>webdav/_HOST@HADOOP.LOCALDOMAIN</value> </property> <property> <name>hadoop.webdav.server.kerberos.keytab</name> <value>/path/to/webdav.keytab</value> </property> <property> <name>hadoop.webdav.authentication.type</name> <value>kerberos</value> </property> <property> <name>hadoop.webdav.authentication.kerberos.principal</name> <value>HTTP/_HOST@HADOOP.LOCALDOMAIN</value> </property> <property> <name>hadoop.webdav.authentication.kerberos.keytab</name> <value>/path/to/webdav.keytab</value> </property>
This is a configuration for a secure cluster. Given this example, if the KDC is running and per host principals for 'webdav' and 'HTTP' were added to the local keytab, then you should see something like the below logged at startup:
12/05/02 14:53:52 INFO security.UserGroupInformation: Login successful for user webdav/ip-10-177-2-205.us-west-1.compute.internal@HADOOP.LOCALDOMAIN using keytab file /etc/hadoop/conf/hdfs.keytab 12/05/02 14:53:52 INFO webdav.Main: Listening on 0.0.0.0/0.0.0.0:8080 12/05/02 14:53:52 INFO server.KerberosAuthenticationHandler: Initialized, principal [HTTP/_HOST@HADOOP.LOCALDOMAIN] from keytab [/etc/hadoop/conf/hdfs.keytab] 12/05/02 14:53:52 INFO server.AbstractWebdavServlet: authenticate-header = Basic realm="Hadoop WebDAV Server" 12/05/02 14:53:52 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:8080
To test if a client can fetch from the WebDAV server running in a secure configuration, you can use a version of 'curl' that has support for GSS-Negotiate (check with curl -V):
$ kinit
( Log in. )
$ curl --negotiate -u $USER -b ~/cookiejar.txt -c ~/cookiejar.txt http://$HOST:8080/$PATH